Help to solve EmrEtlRunner HDFS > S3

Hi @paulorod7, @Jeferson - it sounds like you have implemented a slightly altered version of the Lambda architecture that we propose:

http://discourse.snowplow.io/t/how-to-setup-a-lambda-architecture-for-snowplow/249

Instead of archiving the raw events to S3, you are archiving the enriched events to S3. Your EMR job is then failing because you are attempting to enrich the already-enriched events.

Try running the EmrEtlRunner with --skip staging,enrich to prevent this.