My EMR job failed this morning (was successfully run since last week).
This is what I found in controller log:
2017-03-29T19:48:32.155Z WARN Step failed with exitCode -1 and took 2586 seconds
stderr and stdout logs are empty and syslog has nothing interesting in it as well.
Could you please help to understand the meaning of exitCode -1?
Ok, after I moved all files back into “in” folder EMR job processed them successfully on next scheduled time.
But it’s still good to know what exitcode -1 means. May be there is a way to prevent those issues in the future.
@tyomo4ka I’m curious why that worked… Did you do anything differently on the second run? E.g. bump your instances or upgrade EMR ETL runner between jobs?
I’ve been re-running enrichment from the Shred step but no cigar. I’m currently re-running from the Enrich step to see if that works.
Kind of like you said @tyomo4ka , all I had to do was:
- Remove the files in enriched from the failed run
- Re-run enrichment from step “Enrich”
No changes to the pipeline or anything. Figured I’d share back here in case others have issues like this showing up in stderr:
18/07/16 03:42:17 INFO Client:
client token: N/A
ApplicationMaster host: 172.31.19.185
ApplicationMaster RPC port: 0
start time: 1531708082200
final status: FAILED
tracking URL: http://ip-172-31-27-154.ap-southeast-2.compute.internal:20888/proxy/application_1531707705580_0002/
Exception in thread "main" org.apache.spark.SparkException: Application application_1531707705580_0002 finished with failed status
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
18/07/16 03:42:17 INFO ShutdownHookManager: Shutdown hook called
18/07/16 03:42:17 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-6393393a-b249-4816-82a1-56e842e65821
Command exiting with ret '1'