How to shred events into Redshift from the real-time pipeline?


We are currently attempting to implement snowplow as an alternative to an older, non-scalable analytical process.

I’ve set up the scala collector, enrichment and sink (gathered from, which seemed to be the only way to get kinesis stream data into redshift), but run into a couple of problems:

  • Shredding appears to be impossible with this setup: derived contexts now end up as json in the events table, not as intended in i.e. com_snowplowanalytics_com_snoplow_ua_parser_context_1. How can we create a pipeline with the realtime scala & kinesis comination, which allows data to end up in redshift & allow shredding?
  • The current analytical process has quite some custom enrichment with internal data; one event for example might need to call several api points and combine their results (this data is changing only sporadically, say minor changes once a month). What is the recommended way to do this? I tried solving this with the api enrichment & javascript enrichment, though the api does not allow custom logic, and I’m not able to construct a valid javascript enrichment which allows for a delay (or to return callbacks/promises). The alternative would be loading the complete database into redshift and do merges in the analytical process, which might be a little overkill and adds complexity as it needs to manage changes as well.

Any recommendations for the above 2 problems? :slight_smile:

Thanks in advance (and for snowplow in general),


Hey @esquire900 - would you mind re-posting the second question as a separate thread? Multiple distinct questions per thread isn’t great for answers or for future searching.

Hi @esquire900,

To answer your first question, did you take a look at this topic?