Hi there,
I’ve been trying to run the EmrCtlRunner for a few days now and it’s consistently failing at the “Elasticity Spark Step: Shred Enriched Events” step.
The step is dealing wiht millions of events to shredd. Every time it fails.
All executions after the first failure have been with --skip staging,enrich.
Every stderr log file ends as below:
17/10/25 01:04:38 INFO Client: Application report for application_1508886109246_0002 (state: RUNNING)
17/10/25 01:04:39 INFO Client: Application report for application_1508886109246_0002 (state: RUNNING)
17/10/25 01:04:40 INFO Client: Application report for application_1508886109246_0002 (state: RUNNING)
17/10/25 01:04:41 INFO Client: Application report for application_1508886109246_0002 (state: RUNNING)
17/10/25 01:04:42 INFO Client: Application report for application_1508886109246_0002 (state: RUNNING)
17/10/25 01:04:43 INFO Client: Application report for application_1508886109246_0002 (state: FINISHED)
17/10/25 01:04:43 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.31.45.129
ApplicationMaster RPC port: 0
queue: default
start time: 1508886685765
final status: FAILED
tracking URL: http://ip-172-31-42-20.us-west-2.compute.internal:20888/proxy/application_1508886109246_0002/
user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1508886109246_0002 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1167)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1213)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/10/25 01:04:43 INFO ShutdownHookManager: Shutdown hook called
17/10/25 01:04:43 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-75e08e3c-9ff2-42c0-b30d-bad9df2abaf8
Command exiting with ret '1'
Any idea why?
This is running with a cluster of 3x m3.xlarge CORE machines and 0 Tasks - it runs for a little shy of 2 hours and fails.
version: snowplow-rdb-shredder-0.12.0