EmrEtlRunner running for days at Step "Shred Enriched Events"

frankcash · April 8, 2018, 6:41pm

Hello,

I am running snowplow EMR ETL Runner 91. I am running this on top of a 10 m4.4xlarge nodes. The job will run up until the “Elasticity Spark Step: Shred Enriched Events” at which it will run for days and never finish. This doesn’t happen every run and their appears to be little to no pattern on when it will run for days. While it runs for days it creates a backlog of files to run through and makes a big hassle.

I click through on the EMR dashboard to go to the stderr logs and I just say days worth of:

18/04/08 18:05:45 INFO Client: Application report for application_1522951903812_0005 (state: RUNNING)
18/04/08 18:05:46 INFO Client: Application report for application_1522951903812_0005 (state: RUNNING)
18/04/08 18:05:47 INFO Client: Application report for application_1522951903812_0005 (state: RUNNING)
18/04/08 18:05:48 INFO Client: Application report for application_1522951903812_0005 (state: RUNNING)

Looking at the stderr logs there appears to be no difference in the logs except for a finished state being reached on a run that doesn’t go on for days.

I don’t believe that my nodes are running out of disk space: 44%20PM

If anyone has run into this issue before and successfully solved it or has any ideas about how to solve it I would greatly appreciate the knowledge!

anton · April 9, 2018, 3:55am

Hey @frankcash, I think you can find the cause of failure in YARN container logs. Specifically somewhere in containers/application_1522951903812_0005/.

frankcash · May 9, 2018, 1:40pm

Anton, sorry for the delayed response. Where in AWS would I go to find the logs created by a specific “container”? Thanks!

mike · May 9, 2018, 10:28pm

Historical logs end up on S3 (the bucket will depend on your emr-etl-runner configuration) but can also be browsed in the AWS UI by clicking the ‘Summary’ tab and under ‘Configuration Details’ click the folder icon next to ‘Log URI’ - from here you can navigate to the containers directory and it should contain the logs for each application.

If you’re looking for logs while a run is happening there’s typically a latency of a few minutes between logs being generated from a given application on the cluster and the equivalent logs showing up in S3. You can ssh into the EMR master node and run yarn logs -applicationId application_1522951903812_00005 and this will print the logs to stdout.

Topic		Replies	Views
Shred stage failure on EMR ETL Runner upgrade	7	1536	August 6, 2021
Processing folder not empty - but no error on the ETL script! Enrichment	4	1344	June 14, 2017
EMR Shredding fails randomly Enrichment	12	1660	February 23, 2019
Shred problems using Batch Troubleshooting	1	949	December 5, 2020
Elasticity Spark Step: Shred Enriched Events: consistent failure without clear reason Storage targets	2	2366	November 11, 2017

EmrEtlRunner running for days at Step "Shred Enriched Events"

Related topics