Debugging bad data in GCP with BigQuery – Snowplow

antman · December 19, 2018, 11:10pm

One of the key features of the Snowplow pipeline is that it’s architected to ensure data quality up front - rather than spending a lot of time cleaning and making sense of the data before using it, schemas are defined up front and used to validate data as it comes through the pipeline. Another key feature is that it’s highly loss-averse: when data fails validation, those events are preserved as bad rows. Read more about data quality.

This is a companion discussion topic for the original entry at https://snowplowanalytics.com/blog/2018/12/19/debugging-bad-data-in-gcp-with-bigquery/

Topic		Replies	Views
Debugging bad rows on GCP – Snowplow	1	1027	December 20, 2018
Debugging bad rows on GCP – Snowplow GCP pipeline	1	1080	December 20, 2018
About badrows pipeline choices GCP pipeline	1	872	October 23, 2021
Google Cloud Platform data pipeline optimization GCP pipeline	11	4529	April 14, 2020
Bq-failed-inserts topic reason GCP pipeline	3	1141	September 1, 2021

Debugging bad data in GCP with BigQuery – Snowplow

Related topics