Thrift Parsing Format

Hi all,

Still pretty new to snowplow, I’m using the Scala stream collector/enricher on AWS. I’m outputting to Elasticsearch and Indicative atm. So far, so good.

Now I’m trying to write a custom Lambda consumer to read off the Kinesis Enricher stream and parse the thrift events in Python. However, I’m getting odd results, for example:

web 2019-09-07 14:43:50.349 2019-09-07 14:43:48.267 2019-09-07 14:43:47.614 page_view ca1e27a6-576d-447c-9bff-f19484398afc cf js-2.5.1 ssc-0.15.0-kinesis stream-enrich-0.21.0-common-0.37.0 2266434556 43250503056033d5 1 54473bf8-e8af-43b2-932d-8683b47bfe3d AAAA BBBBB CCCC DDDD https 443 /submit https 443 /contact q=submit Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763 en-US 1 0 1 0 0 0 0 0 0 1 24 1368 799 America/Chicago 1368 912 utf-8 1352 2880 b3fed276-0e08-4c8a-81b7-b68fdb367df6 2019-09-07 14:43:48.267 com.snowplowanalytics.snowplow page_view jsonschema 1-0-0

I don’t see anything about which fields each value belongs too and I don’t see separators between the fields.

Am I doing something wrong? Some trick I’m missing?

Thanks much!

@pcb, we have Snowplow Analytics SDKs to work on enriched data. In particular, you would want to utilize Python Analytics SDK. As for the order of the values in the enriched events, you can refer to this code.

1 Like

Thanks so much ihor, that will work perfectly!