How to determine well known fields from enriched data

Jesse_Redl · March 6, 2020, 10:29pm

Hey All,

We’re new to snowplow and are very excited to dig into this project! Over the course of week we have:

enabled a test website to fire off events with the javascript tracker
are running the scala stream collector within kubernetes and its publishing to a google cloud pub sub topic raw topic
are running the big query mutator within kubernetes in listen mode against the types subscription
the beam enrich dataflow is running in streaming mode and is processing the raw events and publishing on an enriched topic
the big query loader dataflow is running and processing the enriched events and inserting into bigquery

Great! We have our event data in big query with relatively little pain. We would like to contribute back how to configure this to run within kubernetes but thats a different post.

Our next step within our snowplow adopiton is to tap into the enriched events stream within our applications but I feel we are missing a core concept of the etl process in that how do we determine what the well known fields are of an event?

The current sdks (scala, python) just seem to have a hard coded list of events that are loaded from the order of the tab delimited enriched event?

Can anyone recommend how to inflate the enriched event back into a structured message or point us in the right direction of where to start?

Thanks all!

ihor · March 6, 2020, 10:46pm

@Jesse_Redl, yes, the enriched data is in predefined TSV format and you can find the order of the properties in this code.

mike · March 6, 2020, 10:49pm

The Scala, Python or NodeJS analytics SDKs would be the best place to start. These SDKs will take the TSV and hydrate with the field names, perform a small amount of “shredding” and then yield a JSON from each event that makes it much easier to work with downstream.

Jesse_Redl · March 6, 2020, 10:56pm

Thanks!

This is exactly what I was looking for in terms of the output of the enriched event.

stevecs · March 9, 2020, 9:59am

Welcome to the community @Jesse_Redl! Looking forward to seeing what you’re able to do with the platform and to your contributions.

Topic		Replies	Views
Need enriched raw data in JSON Enrichment	7	3305	May 26, 2017
How to get structured event data without a separate enrich step? For engineers	2	720	September 16, 2020
Enriched file in enriched/good schema? For data modelers & consumers	17	2748	December 17, 2018
The result of my Stream Enrich has more columns than atomic events	3	1888	September 26, 2019
Data modelling for real time kafka pipeline Enrichment	2	1067	September 2, 2020

How to determine well known fields from enriched data

Related topics