Golang Kinesis Reader

I’ve been making a user-event-aggregation system for Entity-Centric Indexing of Snowplow events in ElasticSearch:

https://github.com/fingerco/go-user-tracking

Right now it’s reading JSON event kinesis stream generated from Snowplow’s Kinesis Tee project.

I’d like to change it to read directly from the raw Kinesis Enrich Stream. But enriched events pushed to Kinesis use a special, compressed format - right?

I know that there’s a Scala library to convert the compressed event data into JSON format. But I was wondering if there was any documentation I could read for doing it in Golang. Or any documentation I could read to program that conversion.

I think you may be thinking of events in the raw collector stream which are encoded as Thrift records.

Events in the enriched stream are in the Snowplow TSV format which consists of 129 (if I remember correctly) fields as described here: https://github.com/snowplow/snowplow/wiki/Stream-Enrich

We have a set of java libraries for serializing/deserializing from this format but sadly they’re not quite ready to be made public yet… working on releasing them soon though :slight_smile:

2 Likes

Perfect!

Just what I was looking for! Thanks!

Hi @fingerco - sounds great. A Snowplow Golang Analytics SDK would be a really useful addition to the family of analytics SDKs. This page should be useful too:

@acgray - looking forward to seeing your Java library for working with Snowplow enriched events!

(Actually @alex - perhaps this is another thread but interested to know your view on languages for the Snowplow ecosystem. We have an Apache Beam rewrite (in java) of the RDB shredder working in production and would consider contributing it back as open source- the benefit being the unified batch/ streaming programming model brings us a step closer to a streaming shredder/loader component. But aware you guys are very Scala based. Is this something you would consider incorporating as an official project?)

1 Like

Hey @acgray - ah, it’s a pity we didn’t join up our efforts on this sooner.

Work will start on moving Stream Enrich to Apache Beam before this summer (as part of our GCP work), and we plan to move RDB Shredder to Beam as well. However, we have no plans to switch to using Java for either of these applications.

I’d encourage anybody thinking about a major piece of work on a core Snowplow component to get in touch with us and we can brainstorm it together - there may be ways of us collaborating, getting things into the official components sooner and saving duplicated effort. I promise we don’t bite :sun_with_face:

It would be great to schedule a call with you @acgray to find out more about your experiences with Beam when we get closer to our ports!

1 Like