Querying Failed BigQuery Events in GCS

Ben_Davison · January 24, 2023, 5:19pm

Hiya,

I was just wondering if this is a paid feature? Querying Failed Events in BigQuery just got better! - #2

I’m trying to set this up: Accessing failed events in file storage | Snowplow Documentation by using the gcs loader dataflow job to load both of these topics into gcs

(bad stream and bad rows stream)

And it writes data, but this is the result in GCS, and not the partitioned style of files.

Thanks,

Ben_Davison · January 24, 2023, 5:33pm

I am a dumb, dumb.

  -v $PWD/config:/snowplow/config \ # if running outside GCP
  -e GOOGLE_APPLICATION_CREDENTIALS=/snowplow/config/credentials.json \ # if running outside GCP
  snowplow/snowplow-google-cloud-storage-loader:0.5.4 \
  --runner=DataFlowRunner \
  --jobName=[JOB-NAME] \
  --project=[PROJECT] \
  --streaming=true \
  --workerZone=[ZONE] \
  --inputSubscription=projects/[PROJECT]/subscriptions/[SUBSCRIPTION] \
  --outputDirectory=gs://[BUCKET]/YYYY/MM/dd/HH/ \ # partitions by date
  --outputFilenamePrefix=output \ # optional
  --shardTemplate=-W-P-SSSSS-of-NNNNN \ # optional
  --outputFilenameSuffix=.txt \ # optional
  --windowDuration=5 \ # optional, in minutes
  --compression=none \ # optional, gzip, bz2 or none
  --numShards=1 # optional

Was using the docker quickstart as a guide of what to run, missed this extra option --partitionedOutputDirectory=gs://[BUCKET]/

istreeter · January 24, 2023, 5:42pm

Hi @Ben_Davison , I’m glad you figured it out!

Yes, you just need to add the --partitionedOutputDirectory option, and then it will be compatible with the BigQuery queries.

Topic		Replies	Views
Google Cloud Platform data pipeline optimization GCP pipeline	11	4529	April 14, 2020
About badrows pipeline choices GCP pipeline	1	871	October 23, 2021
Issues on GCP streamloader GCP pipeline	3	1107	March 22, 2022
Failed inserts for time Partitioned table Storage targets	4	1084	September 9, 2021
BigQuery Loader - Time partitioned table GCP pipeline	2	1419	April 23, 2020

Querying Failed BigQuery Events in GCS

Related topics