Hello,
We are currently attempting to implement snowplow as an alternative to an older, non-scalable analytical process.
I’ve set up the scala collector, enrichment and sink (gathered from https://github.com/jramos/snowplow-kinesis-redshift-sink, which seemed to be the only way to get kinesis stream data into redshift), but run into a couple of problems:
- Shredding appears to be impossible with this setup: derived contexts now end up as json in the events table, not as intended in i.e. com_snowplowanalytics_com_snoplow_ua_parser_context_1. How can we create a pipeline with the realtime scala & kinesis comination, which allows data to end up in redshift & allow shredding?
- The current analytical process has quite some custom enrichment with internal data; one event for example might need to call several api points and combine their results (this data is changing only sporadically, say minor changes once a month). What is the recommended way to do this? I tried solving this with the api enrichment & javascript enrichment, though the api does not allow custom logic, and I’m not able to construct a valid javascript enrichment which allows for a delay (or to return callbacks/promises). The alternative would be loading the complete database into redshift and do merges in the analytical process, which might be a little overkill and adds complexity as it needs to manage changes as well.
Any recommendations for the above 2 problems?
Thanks in advance (and for snowplow in general),
Simon