About badrows pipeline choices

phxtorise · October 23, 2021, 12:10am

Hi,

I have a question about Snowplow badrows. I want to build a badrows pipeline to monitor the real-time Snowplow pipeline status. Currently, I am following this blog (Storing Snowplow bad row events in BigQuery | by Jonathan Merlevede | datamindedbe | Medium) and using the Google Cloud Functions to trigger the pubsub_to_bigquery function once there are some data going into bad, enriched-bad, bq-failed-inserts, or bq-bad-rows topic. But I find it is difficult to define the schema of badrows table in BigQuery, since the schema of badrows data from PubSub sometimes have some conflict with the one in Iglu repo.

I also figured out another solution to use Snowplow Google Cloud Storage Loader(GitHub - snowplow-incubator/snowplow-google-cloud-storage-loader: Dataflow job to dump the content coming from a PubSub subscription into Cloud storage). But it is not real time, and there also exist some schema conflict problems, not quite sure the reason.

What do you think is a better way, or is there any other solution? I’d appreciate it if someone could give me some advice.

istreeter · October 23, 2021, 7:48am

Hi @phxtorise! It is certainly a good viable option to use the Snowplow Google Cloud Storage Loader and load the bad rows to gcs. From there, you can use BigQuery to build a table on top of the gcs directory.

We have instructions on our docs website for how to build the BigQuery table. The instructions tell you how to avoid the schema conflict problems you mentioned, by explicitly defining the table structure instead of using the auto-detect feature.

Topic		Replies	Views
Bq-failed-inserts topic reason GCP pipeline	3	1139	September 1, 2021
Failure NotTSV with BigQuery Loader Troubleshooting	1	923	August 19, 2022
How to collect all the badrows & badrows complete classification GCP pipeline	1	909	October 27, 2021
Snowplow Events from Google Bucket to BigQuery Storage targets	1	1109	July 29, 2020
Reason for bq bad events topic Storage targets	6	1169	September 7, 2021

About badrows pipeline choices

Related topics