Missing data even collector return 200

Piyush_Kukadiya · August 26, 2021, 6:16am

Hello,
I am new to snowplow trying to setup opensource version on GCP VM.

I am doing load testing using jmeter http request, i have noticed that 100% request to the collector end up with response 200, however i cannot see same numbers of rows in the Bigquery table (missing avg. 40% data).

I am wondering that i can’t find that 40% missing requests/data on any of the snowplow queue (good/bad/type etc.).

This is happening with even when sending just 10 requests so no issue with instance config may be.
Please help me fixing the issue, thanks so much

PaulBoocock · August 27, 2021, 10:23am

Hey @Piyush_Kukadiya Welcome to the Snowplow Discourse!

The Snowplow Collector will pretty much always return a 200, it only returns other errors when things like Pubsub is unavailable or its unhealthy somehow. Are you sending all your events with the official Snowplow trackers? Any custom schemas that could be causing validation failures, or are you using “standard” event types like page_view?

The Snowplow pipeline is typically lossless so those missing events should have gone somewhere. I think it’s worth looking at some of the core concepts first to see if we can find them in an expected location.

Usually events go missing when they do not conform to the tracker protocol (hard to do if you’re using an official tracker to send events) or they fail validation as part of the enrich step. However, those events should all end up in your bad pubsub topic, and your bad GCS bucket if you’re loading your bad rows to GCS with the GCS Loader (typically what we’d recommend).

Also, if the BigQuery Loader can’t write to the table because of a schema mismatch, your events will need to go to a failed inserts stream where they will be reinserted by the BigQuery Repeater, after the BigQuery Mutator has created the new types. As described here.

Topic		Replies	Views
Full trace of an event from tracker to bigquery GCP pipeline	3	948	January 20, 2022
Issues on GCP streamloader GCP pipeline	3	1107	March 22, 2022
Bq-failed-inserts topic reason GCP pipeline	3	1139	September 1, 2021
Failing to receive custom events in Snowplow Community Edition Troubleshooting	0	399	January 12, 2024
Aggregate snowplow event metrics for analytics GCP pipeline	12	1039	October 27, 2022

Missing data even collector return 200

Related topics