Help using value in enriched data as a key in kafka message

ashwinGokhale · July 11, 2017, 7:17pm

Hello,

I am new to snowplow and I’m having a problem where I want to use a value in the enriched data, a user id, as the key for a key-value message when Snowplow sinks to Kafka. Right now, for each message Snowplow sinks to Kafka, the key is a uniquely generated string and the value is a tab separated string. However, I want the key to be the user id, the (nth) column in the value, so that I know that each user’s data will go to the same partition every single time. Is there any way I can choose the value of the key? If not, are there any other ways I could solve this problem? Any help would be great.

Thanks,
Ashwin

anton · July 12, 2017, 5:13am

Hello @ashwinGokhale,

I’m not experienced with Kafka, but I believe what you want to do can be achieved with Snowplow Analytics SDKs, particularly EventTransformer from Scala SDK. Using EventTransformer, you can parse enriched event TSV as JSON object, where each key correspond to one from our canonical model, so you can group by (or however it called in Kafka) enriched events by user_id.

BenFradet · July 12, 2017, 1:04pm

At the moment I don’t think this is possible inside Snowplow.

However, as Anton suggested, you can always reprocess the enriched data in any way you see fit, applying a different partitioning scheme being one of them.

This could easily be done using streaming framework such as Kafka Streams or Spark Streaming for example.

ashwinGokhale · July 12, 2017, 6:49pm

I think using Kafka Streams will be the best solution for my project. Thanks a lot!

alex · July 13, 2017, 9:26pm

Just to add that there is a ticket for this (not yet scheduled):

https://github.com/snowplow/snowplow/issues/1924

This ticket was itself inspired by another Discourse post:

http://discourse.snowplow.io/t/partition-key-for-kinesis/849

Topic		Replies	Views
Need enriched raw data in JSON Enrichment	7	3305	May 26, 2017
[Solved] Output of Stream Enricher is not stable and does not have a fixed length Enrichment	6	1411	February 9, 2021
Duplicate events, using event_id as partition_key Troubleshooting	1	2750	October 20, 2017
Set Kafka Key inside snowplow settings Kafka real-time pipeline	0	1078	October 23, 2019
Enrich with Kafka Kafka real-time pipeline	6	7630	May 16, 2017

Help using value in enriched data as a key in kafka message

Related topics