Here I want to understand a bit about EmrEtlRunner. We have two Snowplow Environments. One for production and second for development use. We use second one to try out new events, upgradation mockup, etc.
We don’t run EmrEtlRunner everyday, we do it when we have something to try with new events. It could be once in a week or fortnightly. The amount of events captured will be very very less. Not more than 100 or so.
But than too, the EmrEtlRunner takes more time to complete around 9-10 hours. In contrast, the EmrEtlRunner for production environment takes 3-4 hours, processing millions of events per day.
Here my guess is the time taken by EmrEtlRunner also depends on number of files to be processed and not only number of events. I want to know is my guess correct and what can be done to reduce the time.
Thanks