Hi,
I have a question about Snowplow badrows. I want to build a badrows pipeline to monitor the real-time Snowplow pipeline status. Currently, I am following this blog (Storing Snowplow bad row events in BigQuery | by Jonathan Merlevede | datamindedbe | Medium) and using the Google Cloud Functions to trigger the pubsub_to_bigquery function once there are some data going into bad, enriched-bad, bq-failed-inserts, or bq-bad-rows topic. But I find it is difficult to define the schema of badrows table in BigQuery, since the schema of badrows data from PubSub sometimes have some conflict with the one in Iglu repo.
I also figured out another solution to use Snowplow Google Cloud Storage Loader(GitHub - snowplow-incubator/snowplow-google-cloud-storage-loader: Dataflow job to dump the content coming from a PubSub subscription into Cloud storage). But it is not real time, and there also exist some schema conflict problems, not quite sure the reason.
What do you think is a better way, or is there any other solution? I’d appreciate it if someone could give me some advice.