I am running Snowplow on GCP. Since 2022-03-08 I have noticed that less and less events have been delivered to BigQuery table. I checked the amount of undelivered messages on PubSub topic subscription and it was increased since that date. What I checked so far:
I have dedicated VM for running snowplow bigquery streamloader (v 1.0.2) processes with autoscaling rules. The process was/is running, just doing almost nothing. In best case inserting 100k events per day.
I have checked quota and api rates - nothing is even close to limits.
On APIs & Services Cloud Pub/Sub API I have noticed that google.pubsub.v1.Subscriber.StreamingPull latency is exactly increased since the date when data started to pileup.
I also ran snowplow bigquery streamloader in DEBUG mode, but could not see any errors.
Does anyone else have similar issue on GCP? Any ideas on where to check the root cause and stop loosing the data because of retention? Thanks a lot.
Hi @popi, the first place to check will be the failed inserts. The best way to do it in 1.0.2 is to check the logs of your repeater application. It should contain records for how long the repeater has been running, and how many events it processed. If the events could ultimately not be inserted, for whatever reason, they will have been written to the GCS bucket you’ve specified under repeater.output.deadLetters in your repeater config.hocon file. There, you should find the failed events (bad rows) with an error message explaining why they couldn’t be inserted.
I would also recommend you upgrade to 1.2.0, which has much improved logs.