I am new to snowplow trying to setup opensource version on GCP VM.
I am doing load testing using jmeter http request, i have noticed that 100% request to the collector end up with response 200, however i cannot see same numbers of rows in the Bigquery table (missing avg. 40% data).
I am wondering that i can’t find that 40% missing requests/data on any of the snowplow queue (good/bad/type etc.).
This is happening with even when sending just 10 requests so no issue with instance config may be.
Please help me fixing the issue, thanks so much
Hey @Piyush_Kukadiya Welcome to the Snowplow Discourse!
The Snowplow Collector will pretty much always return a 200, it only returns other errors when things like Pubsub is unavailable or its unhealthy somehow. Are you sending all your events with the official Snowplow trackers? Any custom schemas that could be causing validation failures, or are you using “standard” event types like
The Snowplow pipeline is typically lossless so those missing events should have gone somewhere. I think it’s worth looking at some of the core concepts first to see if we can find them in an expected location.
Usually events go missing when they do not conform to the tracker protocol (hard to do if you’re using an official tracker to send events) or they fail validation as part of the
enrich step. However, those events should all end up in your
bad pubsub topic, and your
bad GCS bucket if you’re loading your
bad rows to GCS with the GCS Loader (typically what we’d recommend).
Also, if the BigQuery Loader can’t write to the table because of a schema mismatch, your events will need to go to a
failed inserts stream where they will be reinserted by the BigQuery Repeater, after the BigQuery Mutator has created the new types. As described here.