How to determine well known fields from enriched data

Hey All,

We’re new to snowplow and are very excited to dig into this project! Over the course of week we have:

  • enabled a test website to fire off events with the javascript tracker
  • are running the scala stream collector within kubernetes and its publishing to a google cloud pub sub topic raw topic
  • are running the big query mutator within kubernetes in listen mode against the types subscription
  • the beam enrich dataflow is running in streaming mode and is processing the raw events and publishing on an enriched topic
  • the big query loader dataflow is running and processing the enriched events and inserting into bigquery

Great! We have our event data in big query with relatively little pain. We would like to contribute back how to configure this to run within kubernetes but thats a different post.

Our next step within our snowplow adopiton is to tap into the enriched events stream within our applications but I feel we are missing a core concept of the etl process in that how do we determine what the well known fields are of an event?

The current sdks (scala, python) just seem to have a hard coded list of events that are loaded from the order of the tab delimited enriched event?

Can anyone recommend how to inflate the enriched event back into a structured message or point us in the right direction of where to start?

Thanks all!

@Jesse_Redl, yes, the enriched data is in predefined TSV format and you can find the order of the properties in this code.

1 Like

The Scala, Python or NodeJS analytics SDKs would be the best place to start. These SDKs will take the TSV and hydrate with the field names, perform a small amount of “shredding” and then yield a JSON from each event that makes it much easier to work with downstream.



This is exactly what I was looking for in terms of the output of the enriched event.

1 Like

Welcome to the community @Jesse_Redl! Looking forward to seeing what you’re able to do with the platform and to your contributions.