We have the following situation while using snowplow (collector, enrich pubusub, bq-loader, mutator, bigquery) in GCP.
We collecting events(15 mil/day) from different sources(teams, products, etc) and we ended with some product/team complaining about some events are missing from bigquery although they have received a 200 response from our collector as per their saying.
I mention that I have checked if the events for that product have been rejected by any validation like(bad-enrichments, bq-failed-inserts, etc.) also the logs for each component that we are using and couldn’t find anything.
Is there a way to trace events(debug) through million of events we receive daily?
Events should end up either in good or bad - so if it’s not showing up there and you are sure the events are being sent you may have an issue with the collector failing to publish to PubSub so I’d look at the publish metrics and logs for your collector.
It is possible for events to also go into failed events at the collector - so make sure you check that topic as well.
I’ve never really seen any instance (other than the collector failing to publish / crashing) where events haven’t ended up in good or bad.
Also worth checking the dead-end GCS bucket for failed inserts. When writing to BigQuery, if an event is valid but fails to load for some other reason, it will be retried by the BigQuery Repeater. And if the repeater can’t insert it either, then it will write it to that dead-end bucket. It’s technically part of the bad portion of output but it gets missed sometimes.