Hi @grzegorzewald . Looks like your code is working with base64 records and thrift stuff is unused in recovery.py.
I’m working on decoding success records not failed ones so format is different. Thank you for the code anyway.
Thank you for the previous reply. Snowplow has rich functionality for events enrichment/storing and this is the reason we’ve chosen it. On this stage of project we don’t have enough time to put efforts on events enrichment and storing. That is why we decided to go with raw events and Kinisis-S3 sink.
Could you please give a look to the file’s format that I downloaded from S3.
It doesn’t look as containing thrift records but as UTF-8 strings with some byte delimiters.
I’ve tried to read it with elephant bird in this repo without success. Could you please give a look to this code too?
Hi @vshulga - trying to parse a Snowplow raw event is a solved problem - this is precisely what our Scala Common Enrich library does, and this library is embedded into both our Hadoop Enrich and Stream Enrich applications. Given that you have your raw events in S3 already, I would recommend going with Hadoop Enrich.