The result of my Stream Enrich has more columns than atomic events

Alexandre_Rayes · September 26, 2019, 6:07pm

Hello, guys
I am trying to use Snowplow with Python and React Native trackers, a Scala Stream Collector and a Stream enrich. The result is then loaded via Firehose into S3. The data in S3 is then accessed by Athena to be queried.
The problem is that the data that comes from my Enrichment process does not fit my atomic.events table. It has 2 or 3 columns more than it should have.
I got the atomic.events create table statement from https://github.com/snowplow/snowplow/blob/master/4-storage/redshift-storage/sql/atomic-def.sql

Is there something wrong with the version of the enricher or collector that I’m using? Or am I using an old version for atomic.events?

Thanks in advance

ihor · September 26, 2019, 6:29pm

@Alexandre_Rayes, indeed, the enriched event has 3 extra properties (when comparing with the shredded event) as per the diagram here (see 2nd image): https://github.com/snowplow/snowplow/wiki/StorageLoader. You can see the structure of a Snowplow enriched event here: https://github.com/snowplow/snowplow/blob/master/3-enrich/scala-common-enrich/src/main/scala/com.snowplowanalytics.snowplow.enrich/common/outputs/EnrichedEvent.scala#L41-L249. Though not sure if that would be the same as data are expected to be loaded to S3 with S3 Loader.

To query good data in S3 with Athena you can follow this guide: Using AWS Athena to query the 'good' bucket on S3

Alexandre_Rayes · September 26, 2019, 7:41pm

So ATOMIC.EVENTS is supposed to receive shredded events instead of enriched events?
How do I do that if I’m using Stream Enrich instead of EmrEtlRunner?

ihor · September 26, 2019, 9:00pm

@Alexandre_Rayes, it depends on what ATOMIC.EVENTS you are talking about. When it comes to Redshift then yes, events should be stripped off those 3 fields as the data in them is loaded into separate tables.

To work with enriched data, we offer Snowplow Analytics SDK and you should take into consideration the enriched event’s structure as per the link I provided earlier.

Topic		Replies	Views
Using AWS Athena to query the shredded events For data modelers & consumers	0	5565	August 4, 2017
Snowplow System Columns Enrichment	3	1274	April 14, 2021
How to determine well known fields from enriched data Enrichment	4	1167	March 9, 2020
AWS Athena as an alternative data store For engineers	0	1687	January 11, 2017
[redshift] unstructured event not save in correct schema Redshift	5	4303	February 27, 2017

The result of my Stream Enrich has more columns than atomic events

Related topics