GCP notTSV loader error -

We’re setting up the GCP pipeline with the following components:

  1. GCP collector
  2. FS-2 enrich
  3. Streamloader
  4. Biquery mutator
  5. Biquery repeater

We’ve got everything running, and the events are being passed through enrich, however they aren’t getting processed by the loader. When we check the pubsub logs, the following error appears in loader bad rows pubsub:


The enricher config is:

# "Gcp" is the only valid option now
auth = {
  type = "Gcp"

# Collector input
input = {
  type = "PubSub"  
  subscription = "projects/dh-event-pipe/subscriptions/enricher_in"  

# Enriched events output
good = {
  type = "PubSub"
   topic = "projects/dh-event-pipe/topics/enricher_good"

# Bad rows output
bad = {
  type = "PubSub"
   topic = "projects/dh-event-pipe/topics/enricher_bad"

assetsUpdatePeriod = "7 days"

metricsReportPeriod = "1 second"

The loader config is:

  "schema": "iglu:com.snowplowanalytics.snowplow.storage/bigquery_config/jsonschema/1-0-0",
  "data": {
    "name": "GCP BigQuery test",
    "id": "4c09e258-1ca7-41cc-9c09-700b0a0910ed",
    "projectId": "dh-event-pipe",
    "datasetId": "atomic",
    "tableId": "events",
    "input": "enricher_in",
    "typesTopic": "event_types",
    "typesSubscription": "mutator_in",
    "badRows": "loader_bad",
    "failedInserts": "loader_retry",
    "load": {
           "mode": "FILE_LOADS",
           "frequence": 1800,
           "frequency": 1800
    "purpose": "ENRICHED_EVENTS"

I’m guessing we’ve got a config option wrong somewhere, any idea what could be causing this?

Hey @iain,

I’m bit suspicious of the fact that you have enricher_in as input in BQ config and enrich config as well. It seems your BQ Loader is trying to process collector data, which should be apparent from the payload property in that loader_parsing_error bad row.

Unrelated. FILE_LOADS load mode doesn’t work in StreamLoader, only STREAMING_INSERTS - we should make it more explicit when application starts.

Also, @iain, we’re very interested in general feedback on the assets. If you noticed anything not working well (or working too good) - please let us know, we’d llke to make these assets our primary way to run Snowplow and OSS commmunity feedback would really help to prioritize things.

Thanks @anton , that’s working now!

We’ve got a GCP pipeline up and running in parallel with the AWS one now, so I’ll share any feedback with you.

We’ve currently go the snowplow collector running inside Cloud Run, so will be interesting to see how that compares.

