Important notice: Snowplow BigQuery Loader vulnerability and fix

Project: BigQuery Loader - specifically the Repeater version 0.2.0 and above
Vulnerability: Data not secured

Description:

This issue concerns the Google Cloud Storage (GCS) buckets that store errored data - processed, valid data that could not be loaded at the final stage. The Access Control List (ACL) for the Repeater data is set to be publicly accessible in a read-only mode. This issue was introduced in version 0.2.0 of the BigQuery Repeater.

Impact:

Any pipelines running the Repeater element of the loader will be impacted. This is a separately run command, so you will be aware if you are using it. The location of the asset is obscured, however you should act on the steps below immediately.

Solution:

  1. Secure previously loaded data in the bucket
  2. Update to BigQuery Repeater 0.4.2.
  3. Secure data that has loaded between steps 1 and 2.