Kafka Pipeline message format docs

Hello,

I’m developing an application for debugging our kafka event pipeline. I’m looking for any documentation on the format of messages as they appear in collector-good, collector-bad, enrich-good, and enrich-bad topics in kafka. Currently I’m just going based on inference. It seems that collector-good uses a kind of binary format, enrich-good uses a TSV format, and enrich-bad uses a raw JSON format. I’d like to be able to develop a more robust implementation of each of these (I have yet to actually encounter any collect-bad events). Are there any docs on the format of these messages?

@ShortRoundDev I am also very interested on this! Let me know how you are planning to building the application for debug.

The bad events are documented in iglu-central, for example:

List - iglu-central/schemas/com.snowplowanalytics.snowplow.badrows at master · snowplow/iglu-central · GitHub

Hi there,

The raw stream (collector-good) is in Thrift, and there are some details here: Stream Collector | Snowplow Documentation.

The enriched stream (enriched-good) is indeed in TSV and is documented here: Understanding the enriched TSV format | Snowplow Documentation. If you plan to work with this data, I recommend to take a look at our Analytics SDKs that do a lot of parsing for you: Analytics SDKs | Snowplow Documentation.

Finally, failed events from either failed stream (collector-bad or enrich-bad) will conform to one of the numerous failed event schemas, which are all here: iglu-central/schemas/com.snowplowanalytics.snowplow.badrows at master · snowplow/iglu-central · GitHub.

Hope this helps!

3 Likes