How to shred events into Redshift from the real-time pipeline?

esquire900 · May 25, 2016, 8:46am

Hello,

We are currently attempting to implement snowplow as an alternative to an older, non-scalable analytical process.

I’ve set up the scala collector, enrichment and sink (gathered from https://github.com/jramos/snowplow-kinesis-redshift-sink, which seemed to be the only way to get kinesis stream data into redshift), but run into a couple of problems:

Shredding appears to be impossible with this setup: derived contexts now end up as json in the events table, not as intended in i.e. com_snowplowanalytics_com_snoplow_ua_parser_context_1. How can we create a pipeline with the realtime scala & kinesis comination, which allows data to end up in redshift & allow shredding?
The current analytical process has quite some custom enrichment with internal data; one event for example might need to call several api points and combine their results (this data is changing only sporadically, say minor changes once a month). What is the recommended way to do this? I tried solving this with the api enrichment & javascript enrichment, though the api does not allow custom logic, and I’m not able to construct a valid javascript enrichment which allows for a delay (or to return callbacks/promises). The alternative would be loading the complete database into redshift and do merges in the analytical process, which might be a little overkill and adds complexity as it needs to manage changes as well.

Any recommendations for the above 2 problems?

Thanks in advance (and for snowplow in general),

Simon

alex · May 25, 2016, 12:10pm

Hey @esquire900 - would you mind re-posting the second question as a separate thread? Multiple distinct questions per thread isn’t great for answers or for future searching.

ihor · May 25, 2016, 6:54pm

Hi @esquire900,

To answer your first question, did you take a look at this topic?

Regards,
Ihor

Topic		Replies	Views
Scala Kinesis Enrich AWS real-time pipeline	9	2883	April 9, 2018
Snowplow real-time analysis with on-premise pipeline For engineers	6	2592	January 4, 2018
Real-time pipeline AWS real-time pipeline	2	1955	May 24, 2018
Enriched event stream into Redshift using Kinesis Firehose AWS real-time pipeline	7	5765	May 31, 2016
Shredding to Redshift in the Scala Collector Flow AWS batch pipeline (Legacy)	2	2118	September 24, 2017

How to shred events into Redshift from the real-time pipeline?

Related topics