Can We Consolidate Collector and Enricher Bad Data Streams?

Hi,

In our clickstream setup, we currently use two separate streams for handling bad data—one for the collector and another for the enricher. Is it possible to combine these two streams into a single stream, so that both the collector and enricher write bad data to the same stream?

If we were to do this, is there an identifier within the bad data that would allow us to distinguish whether it originated from the collector or the enricher?

Hi @Sreenath yes you can - its what we do for BDP customers and is also how our Community Edition Terraform is oriented as well.

You can differentiate them based on the schema used in the bad data payload as well as certain identifiers within them as well. Specifically out of the Collector you can only get “SizeViolation” failures and the “processor” attribute should indicate the collector (snowplow-badrows/src/main/scala/com.snowplowanalytics.snowplow.badrows/BadRow.scala at master · snowplow/snowplow-badrows · GitHub).

Hope this helps!

@josh
Can you provide any sample of collector bad events and enricher bad events.

We are planning for a alert mechanism whenever a bad event comes to kinesis stream(bad),

Hi @Sreenath they are easy enough to generate yourself - for enrich sending an event like: curl -XGET https://<your-collector>/i will cause a bad event to be created.

For the Collector you need to send a single “oversized” event which will depend on your cloud for what is allowed.