I have the stream loader 1.1.0 running as an app engine service. The instances are autoscaled so there was no downtime when the following error occurred.
Error:
com.google.cloud.bigquery.BigQueryException: Read timed out
at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate ( com/google.cloud.bigquery.spi.v2/HttpBigQueryRpc.java:115 )
at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.insertAll ( com/google.cloud.bigquery.spi.v2/HttpBigQueryRpc.java:494 )
at com.google.cloud.bigquery.BigQueryImpl.insertAll ( com/google.cloud.bigquery/BigQueryImpl.java:1065 )
at com.snowplowanalytics.snowplow.storage.bigquery.streamloader.Bigquery$.$anonfun$mkInsert$2 ( com/snowplowanalytics.snowplow.storage.bigquery.streamloader/Bigquery.scala:65 )
Error indicated a failure when inserting data, does the occurrence of this exception indicate a loss of data?
There will not be data loss when this exception happens. The loader is designed to ack the incoming pubsub message only after the event has been successfully written to BigQuery. We try to design all Snowplow components on this same principle, so that data should never be lost upon unexpected exceptions.
I just re-read my comment and realised it could sound a bit odd: the error message is about a read timeout, but I suggested changing a write timeout setting. But I still think this is correct β both error message and config setting refer to inserting data into BigQuery, so they are connected.
Hi @siv I think the exception you shared is not the root cause of the problem. It shows that a thread was interrupted, but the reason for the interruption is probably due to another exception in another thread. Is there anything else in the logs immediately before or after that exception was printed?