I’m trying to setup a basic snowplow pipeline for GCP. All the test events I’m sending ends up in the bad rows sub. So far I have the following component running:
- Collector
- Iglu server
- BigQuery Streamloader.
I’m sending the following test:
curl 'http://xxx.xxx.xx.xx:xxxx/com.snowplowanalytics.snowplow/tp2' \
-H 'Content-Type: application/json; charset=UTF-8' \
-H 'Cookie: _sp=305902ac-8d59-479c-ad4c-82d4a2e6bb9c' \
--data-raw '{"schema":"iglu:com.snowplowanalytics.snowplow/payload_data/jsonschema/1-0-4","data":[{"e":"pv","tv":"js-3.4.0","p":"web"}]}'
Here is what I get in the bad rows pubsub subscription:
{"schema":"iglu:com.snowplowanalytics.snowplow.badrows/loader_parsing_error/jsonschema/2-0-0","data":{"processor":{"artifact":"snowplow-bigquery-streamloader","version":"1.4.0"},"failure":{"type":"NotTSV"},"payload":"\u000b\u0000d\u0000\u0000\u0000\u000b88.123.48.3\n\u0000�\u0000\u0000\u0001��ғ�\u000b\u0000�\u0000\u0000\u0000\u0005UTF-8\u000b\u0000�\u0000\u0000\u0000\u0016ssc-2.7.0-googlepubsub\u000b\u0001,\u0000\u0000\u0000\u000bcurl/7.77.0\u000b\u0001@\u0000\u0000\u0000#/com.snowplowanalytics.snowplow/tp2\u000b\u0001T\u0000\u0000\u0000|{\"schema\":\"iglu:com.snowplowanalytics.snowplow/payload_data/jsonschema/1-0-4\",\"data\":[{\"e\":\"pv\",\"tv\":\"js-3.4.0\",\"p\":\"web\"}]}\u000f\u0001^\u000b\u0000\u0000\u0000\u0006\u0000\u0000\u0000\u001bTimeout-Access: <function1>\u0000\u0000\u0000\u0018Host: {ipofmycollector}\u0000\u0000\u0000\u0017User-Agent: curl/7.77.0\u0000\u0000\u0000\u000bAccept: */*\u0000\u0000\u00000Cookie: _sp=305902ac-8d59-479c-ad4c-82d4a2e6bb9c\u0000\u0000\u0000\u0010application/json\u000b\u0001h\u0000\u0000\u0000\u0010application/json\u000b\u0001�\u0000\u0000\u0000\r104.199.44.20\u000b\u0001�\u0000\u0000\u0000$9ce62856-c74b-41df-87f6-833148cf3d77\u000bzi\u0000\u0000\u0000Aiglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0\u0000"}}
The BigQuery loader is directly listening the collector, could this be the issue? (I don’t have any enrich component). What does NotTSV
mean?
If needed here is the config file of the BigQuery loader:
{
"projectId": "gcp-project-id"
"loader": {
"input": {
"subscription": "good-sub"
}
"output": {
"good": {
"datasetId": "snowplow"
"tableId": "events"
}
"bad": {
"topic": "loader-bad"
}
"types": {
"topic": "bq-types"
}
"failedInserts": {
"topic": "failed-insert"
}
}
}
"mutator": {
"input": {
"subscription": "bq-types-sub"
}
"output": {
"good": ${loader.output.good} # will be automatically inferred
}
}
"repeater": {
"input": {
"subscription": "loader-failed-insert-sub"
}
"output": {
"good": ${loader.output.good} # will be automatically inferred
"deadLetters": {
"bucket": "gs://sp-dead-letter-bucket-sw"
}
}
}
"monitoring": {} # disabled
}
Thanks!