Querying Failed BigQuery Events in GCS

Hiya,

I was just wondering if this is a paid feature? Querying Failed Events in BigQuery just got better! - #2

I’m trying to set this up: Accessing failed events in file storage | Snowplow Documentation by using the gcs loader dataflow job to load both of these topics into gcs


(bad stream and bad rows stream)

And it writes data, but this is the result in GCS, and not the partitioned style of files.

Thanks,

I am a dumb, dumb.

  -v $PWD/config:/snowplow/config \ # if running outside GCP
  -e GOOGLE_APPLICATION_CREDENTIALS=/snowplow/config/credentials.json \ # if running outside GCP
  snowplow/snowplow-google-cloud-storage-loader:0.5.4 \
  --runner=DataFlowRunner \
  --jobName=[JOB-NAME] \
  --project=[PROJECT] \
  --streaming=true \
  --workerZone=[ZONE] \
  --inputSubscription=projects/[PROJECT]/subscriptions/[SUBSCRIPTION] \
  --outputDirectory=gs://[BUCKET]/YYYY/MM/dd/HH/ \ # partitions by date
  --outputFilenamePrefix=output \ # optional
  --shardTemplate=-W-P-SSSSS-of-NNNNN \ # optional
  --outputFilenameSuffix=.txt \ # optional
  --windowDuration=5 \ # optional, in minutes
  --compression=none \ # optional, gzip, bz2 or none
  --numShards=1 # optional

Was using the docker quickstart as a guide of what to run, missed this extra option --partitionedOutputDirectory=gs://[BUCKET]/

2 Likes

Hi @Ben_Davison , I’m glad you figured it out!

Yes, you just need to add the --partitionedOutputDirectory option, and then it will be compatible with the BigQuery queries.

1 Like