I am completely new to the Snowplow world, but i have successfully set up the snowplow environment end to end and have data flowing into redshift.
My issue is I cannot find how to pull the enriched data from Kinesis into emretl to be loaded into redshift.
So i end up enriching twice once for kinesis (which we need) and then run the equivalent of the batch processing via the raw S3 logs to get it enriched and loaded in emretl.
There must be a less resource intensive way but I’ve been looking at the documentation and forum but can’t find an answer any where - please help!
In short, you need to run S3 Loader on stream enriched data and EmrEtlRunner in Stream Enrich mode. This way the data will be enriched once (in Stream Enrich component) and EmrEtlRunner will be used to shred and load the data to Redshift.
@ihor Thank you for your response and details it’s really appreciated. I have poured over this area during the past couple of days and i can’t find how to invoke stream enrich mode? Hence changing to batch stream. So is it a change in config somewhere i need or is it a specific command i need to use? Many thanks in advance for your help