Kafka Pipeline message format docs


I’m developing an application for debugging our kafka event pipeline. I’m looking for any documentation on the format of messages as they appear in collector-good, collector-bad, enrich-good, and enrich-bad topics in kafka. Currently I’m just going based on inference. It seems that collector-good uses a kind of binary format, enrich-good uses a TSV format, and enrich-bad uses a raw JSON format. I’d like to be able to develop a more robust implementation of each of these (I have yet to actually encounter any collect-bad events). Are there any docs on the format of these messages?

@ShortRoundDev I am also very interested on this! Let me know how you are planning to building the application for debug.

The bad events are documented in iglu-central, for example:

List - iglu-central/schemas/com.snowplowanalytics.snowplow.badrows at master · snowplow/iglu-central · GitHub

Hi there,

The raw stream (collector-good) is in Thrift, and there are some details here: Stream Collector | Snowplow Documentation.

The enriched stream (enriched-good) is indeed in TSV and is documented here: Understanding the enriched TSV format | Snowplow Documentation. If you plan to work with this data, I recommend to take a look at our Analytics SDKs that do a lot of parsing for you: Analytics SDKs | Snowplow Documentation.

Finally, failed events from either failed stream (collector-bad or enrich-bad) will conform to one of the numerous failed event schemas, which are all here: iglu-central/schemas/com.snowplowanalytics.snowplow.badrows at master · snowplow/iglu-central · GitHub.

Hope this helps!