Hi everyone! I am experiencing a strange error in my GCP pipeline consisting of the Scala Stream Collector, PubSub Enrich and GCS Loader. Messages which pass validation in Snowplow Micro (same iglu resolver config) are failing in PubSub Enrich with an “enriched bad” message stating that the Collector is passing messages to Enrich with an empty body/querystring. Here is the JSON of the result in the “enriched bad” stream:
{
"schema":"iglu:com.snowplowanalytics.snowplow.badrows/tracker_protocol_violations/jsonschema/1-0-0",
"data":{
"processor":{
"artifact":"snowplow-enrich-pubsub",
"version":"2.0.1"
},
"failure":{
"timestamp":"2021-09-17T01:06:50.549928Z",
"vendor":"com.snowplowanalytics.snowplow",
"version":"tp2",
"messages":[
{
"field":"body",
"value":null,
"expectation":"empty body: not a valid tracker protocol event"
},
{
"field":"querystring",
"value":null,
"expectation":"empty querystring: not a valid tracker protocol event"
}
]
},
"payload":{
"vendor":"com.snowplowanalytics.snowplow",
"version":"tp2",
"querystring":[
],
"contentType":null,
"body":null,
"collector":"ssc-2.3.0-googlepubsub",
"encoding":"UTF-8",
"hostname":"sp.palmetto.com",
"timestamp":"2021-09-17T01:06:38.957Z",
"ipAddress":"**.***.**.**",
"useragent":"Go-http-client/2.0",
"refererUri":"http://sp.palmetto.com/com.snowplowanalytics.snowplow/tp2",
"headers":[
"Timeout-Access: <function1>",
"Host: sp.palmetto.com",
"Referer: http://sp.palmetto.com/com.snowplowanalytics.snowplow/tp2",
"User-Agent: Go-http-client/2.0",
"x-cloud-trace-context: cc302d6a5a5dd8776a07e46cc56b42b3/8833614303760384032",
"traceparent: 00-cc302d6a5a5dd8776a07e46cc56b42b3-7a974d6822728020-00",
"X-Forwarded-For: 72.219.70.50",
"X-Forwarded-Proto: https",
"forwarded: for=\"72.219.70.50\";proto=https",
"Accept-Encoding: gzip"
],
"networkUserId":"01eab90b-eb02-45bb-8e84-e40ee9ecd451"
}
}
}
I previously saw this error when we first deployed the pipeline, and it seemed to resolve itself after we expanded the max-uri-length
field back to 32768
in the collector config. We recently re-deployed in a new GCP region without any configuration changes, so I was surprised to see this error again.
Does anyone have any idea why this issue might be occurring?