Error when running snowplow spark event recovery 0.1.0 on EMR

Hello,

So, we have some bad rows in the older format (0.1.0), and we’re trying to run the snowplow-event-recovery-spark-0.1.0.jar on EMR cluster, but we’re running into some issues. Just to see if it’s working, we’re trying to remove the “submitted” property from the body.
This is the config JSON:

{
  "schema": "iglu:com.snowplowanalytics.snowplow/recoveries/jsonschema/1-0-0",
  "data": [
    {
      "name": "RemoveFromBody",
      "error": "error: object instance has properties which are not allowed by the schema: [\"submitted\"]\n    level: \"error\"\n    schema: {\"loadingURI\":\"#\",\"pointer\":\"\"}\n    instance: {\"pointer\":\"\"}\n    domain: \"validation\"\n    keyword: \"additionalProperties\"\n    unwanted: [\"submitted\"]\n",
      "toRemove": "\"submitted\":\".*\",?"
    }
  ]
}

This is the configuration of the steps we send to the EMR cluster:

  {
    "Type": "CUSTOM_JAR",
    "Name": "Move to HDFS",
    "ActionOnFailure": "CANCEL_AND_WAIT",
    "Jar": "/usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar",
    "Args": [
      "--src",
      "s3n://snowplow-recovery-poc-sample-bad-events/",
      "--dest",
      "hdfs:///local/to-recover/",
      "--outputCodec",
      "none"
    ]
  },
  {
    "Type": "CUSTOM_JAR",
    "Name": "snowplow-event-recovery-1_0_0",
    "ActionOnFailure": "CANCEL_AND_WAIT",
    "Jar": "command-runner.jar",
    "Args": [
      "spark-submit",
      "--class", "com.snowplowanalytics.snowplow.event.recovery.Main",
      "--master", "yarn",
      "--deploy-mode", "cluster",
      "s3://snowplow-hosted-assets/3-enrich/snowplow-event-recovery/snowplow-event-recovery-spark-0.1.0.jar",
      "--input", "hdfs:///local/to-recover/",
      "--output", "hdfs:///local/recovered/",
      "--config", "eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy9yZWNvdmVyaWVzL2pzb25zY2hlbWEvMS0wLTAiLCJkYXRhIjpbeyJuYW1lIjoiUmVtb3ZlRnJvbUJvZHkiLCJlcnJvciI6ImVycm9yOiBvYmplY3QgaW5zdGFuY2UgaGFzIHByb3BlcnRpZXMgd2hpY2ggYXJlIG5vdCBhbGxvd2VkIGJ5IHRoZSBzY2hlbWE6IFtcInN1Ym1pdHRlZFwiXVxuICAgIGxldmVsOiBcImVycm9yXCJcbiAgICBzY2hlbWE6IHtcImxvYWRpbmdVUklcIjpcIiNcIixcInBvaW50ZXJcIjpcIlwifVxuICAgIGluc3RhbmNlOiB7XCJwb2ludGVyXCI6XCJcIn1cbiAgICBkb21haW46IFwidmFsaWRhdGlvblwiXG4gICAga2V5d29yZDogXCJhZGRpdGlvbmFsUHJvcGVydGllc1wiXG4gICAgdW53YW50ZWQ6IFtcInN1Ym1pdHRlZFwiXVxuIiwidG9SZW1vdmUiOiJcInN1Ym1pdHRlZFwiOlwiLipcIiw/In1dfQ=="
    ]
  },
  {
    "Type": "CUSTOM_JAR",
    "Name": "Back to S3",
    "ActionOnFailure": "CANCEL_AND_WAIT",
    "Jar": "/usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar",
    "Args": [
      "--src",
      "hdfs:///local/recovered/",
      "--dest",
      "s3n://snowplow-recovery-poc-sample-bad-events/recovered/"
    ]
  }
]

When we’re running it on EMR Release version emr-5.19.0, with Spark 2.3.2 installed, we’re running into this error:

log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /stderr (Permission denied)
	at java.io.FileOutputStream.open0(Native Method)
	at java.io.FileOutputStream.open(FileOutputStream.java:270)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
	at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
	at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
	at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:223)
	at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
	at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
	at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
	at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
	at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
	at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672)
	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516)
	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
	at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
	at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
	at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:120)
	at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:108)
	at org.apache.spark.deploy.SparkSubmit$.initializeLogIfNecessary(SparkSubmit.scala:71)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:128)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
log4j:ERROR Either File or DatePattern options are not set for appender [DRFA-stderr].
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /stdout (Permission denied)
	at java.io.FileOutputStream.open0(Native Method)
	at java.io.FileOutputStream.open(FileOutputStream.java:270)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
	at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
	at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
	at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:223)
	at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
	at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
	at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
	at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
	at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
	at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672)
	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516)
	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
	at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
	at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
	at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:120)
	at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:108)
	at org.apache.spark.deploy.SparkSubmit$.initializeLogIfNecessary(SparkSubmit.scala:71)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:128)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
log4j:ERROR Either File or DatePattern options are not set for appender [DRFA-stdout].
Warning: Skip remote jar s3://snowplow-hosted-assets/3-enrich/snowplow-event-recovery/snowplow-event-recovery-spark-0.1.0.jar.
21/04/19 14:05:26 INFO RMProxy: Connecting to ResourceManager at ip-10-111-242-182.ec2.internal/10.111.242.182:8032
21/04/19 14:05:26 INFO Client: Requesting a new application from cluster with 1 NodeManagers
21/04/19 14:05:27 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
21/04/19 14:05:27 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
21/04/19 14:05:27 INFO Client: Setting up container launch context for our AM
21/04/19 14:05:27 INFO Client: Setting up the launch environment for our AM container
21/04/19 14:05:27 INFO Client: Preparing resources for our AM container
21/04/19 14:05:28 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
21/04/19 14:05:30 INFO Client: Uploading resource file:/mnt/tmp/spark-7887e372-d819-4b2a-8d6b-cd9352791ee7/__spark_libs__1092377533137670526.zip -> hdfs://ip-10-111-242-182.ec2.internal:8020/user/hadoop/.sparkStaging/application_1618572004498_0007/__spark_libs__1092377533137670526.zip
21/04/19 14:05:34 INFO Client: Uploading resource s3://snowplow-hosted-assets/3-enrich/snowplow-event-recovery/snowplow-event-recovery-spark-0.1.0.jar -> hdfs://ip-10-111-242-182.ec2.internal:8020/user/hadoop/.sparkStaging/application_1618572004498_0007/snowplow-event-recovery-spark-0.1.0.jar
21/04/19 14:05:34 INFO S3NativeFileSystem: Opening 's3://snowplow-hosted-assets/3-enrich/snowplow-event-recovery/snowplow-event-recovery-spark-0.1.0.jar' for reading
21/04/19 14:05:37 INFO Client: Uploading resource file:/mnt/tmp/spark-7887e372-d819-4b2a-8d6b-cd9352791ee7/__spark_conf__8170958988937854101.zip -> hdfs://ip-10-111-242-182.ec2.internal:8020/user/hadoop/.sparkStaging/application_1618572004498_0007/__spark_conf__.zip
21/04/19 14:05:37 INFO SecurityManager: Changing view acls to: hadoop
21/04/19 14:05:37 INFO SecurityManager: Changing modify acls to: hadoop
21/04/19 14:05:37 INFO SecurityManager: Changing view acls groups to: 
21/04/19 14:05:37 INFO SecurityManager: Changing modify acls groups to: 
21/04/19 14:05:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
21/04/19 14:05:37 INFO Client: Submitting application application_1618572004498_0007 to ResourceManager
21/04/19 14:05:37 INFO YarnClientImpl: Submitted application application_1618572004498_0007
21/04/19 14:05:38 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:38 INFO Client: 
	 client token: N/A
	 diagnostics: AM container is launched, waiting for AM container to Register with RM
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1618841137594
	 final status: UNDEFINED
	 tracking URL: http://ip-10-111-242-182.ec2.internal:20888/proxy/application_1618572004498_0007/
	 user: hadoop
21/04/19 14:05:39 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:40 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:41 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:42 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:43 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:44 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:45 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:46 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:47 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:48 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:49 INFO Client: Application report for application_1618572004498_0007 (state: FAILED)
21/04/19 14:05:49 INFO Client: 
	 client token: N/A
	 diagnostics: Application application_1618572004498_0007 failed 2 times due to AM Container for appattempt_1618572004498_0007_000002 exited with  exitCode: 13
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1618572004498_0007_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
	at org.apache.hadoop.util.Shell.run(Shell.java:869)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 13
For more detailed output, check the application tracking page: http://ip-10-111-242-182.ec2.internal:8088/cluster/app/application_1618572004498_0007 Then click on links to logs of each attempt.
. Failing the application.
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1618841137594
	 final status: FAILED
	 tracking URL: http://ip-10-111-242-182.ec2.internal:8088/cluster/app/application_1618572004498_0007
	 user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1618572004498_0007 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1165)
	at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1520)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/04/19 14:05:49 INFO ShutdownHookManager: Shutdown hook called
21/04/19 14:05:49 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-764daf1b-a3e3-4f2e-adc2-ea7043df6395
21/04/19 14:05:49 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-7887e372-d819-4b2a-8d6b-cd9352791ee7
Command exiting with ret '1'

Upon googling, we thought it might be an issue with the EMR version.

So, we tried again with EMR Release label emr-5.17.0 with Spark 2.3.1. But, this time, we were getting a different error, and we couldn’t figure out what the actual issue was. Here are the error logs that we’re getting:

21/04/19 15:26:49 INFO RMProxy: Connecting to ResourceManager at ip-10-111-242-175.ec2.internal/10.111.242.175:8032
21/04/19 15:26:49 INFO Client: Requesting a new application from cluster with 1 NodeManagers
21/04/19 15:26:49 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
21/04/19 15:26:49 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
21/04/19 15:26:49 INFO Client: Setting up container launch context for our AM
21/04/19 15:26:49 INFO Client: Setting up the launch environment for our AM container
21/04/19 15:26:49 INFO Client: Preparing resources for our AM container
21/04/19 15:26:51 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
21/04/19 15:26:53 INFO Client: Uploading resource file:/mnt/tmp/spark-eaf4158e-151f-459b-8046-d1d5d06c8c1e/__spark_libs__1562645369368955787.zip -> hdfs://ip-10-111-242-175.ec2.internal:8020/user/hadoop/.sparkStaging/application_1618845744731_0002/__spark_libs__1562645369368955787.zip
21/04/19 15:26:57 INFO Client: Uploading resource s3://snowplow-hosted-assets/3-enrich/snowplow-event-recovery/snowplow-event-recovery-spark-0.1.0.jar -> hdfs://ip-10-111-242-175.ec2.internal:8020/user/hadoop/.sparkStaging/application_1618845744731_0002/snowplow-event-recovery-spark-0.1.0.jar
21/04/19 15:26:57 INFO S3NativeFileSystem: Opening 's3://snowplow-hosted-assets/3-enrich/snowplow-event-recovery/snowplow-event-recovery-spark-0.1.0.jar' for reading
21/04/19 15:27:05 INFO Client: Uploading resource file:/mnt/tmp/spark-eaf4158e-151f-459b-8046-d1d5d06c8c1e/__spark_conf__6487263774957674599.zip -> hdfs://ip-10-111-242-175.ec2.internal:8020/user/hadoop/.sparkStaging/application_1618845744731_0002/__spark_conf__.zip
21/04/19 15:27:06 INFO SecurityManager: Changing view acls to: hadoop
21/04/19 15:27:06 INFO SecurityManager: Changing modify acls to: hadoop
21/04/19 15:27:06 INFO SecurityManager: Changing view acls groups to: 
21/04/19 15:27:06 INFO SecurityManager: Changing modify acls groups to: 
21/04/19 15:27:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
21/04/19 15:27:06 INFO Client: Submitting application application_1618845744731_0002 to ResourceManager
21/04/19 15:27:06 INFO YarnClientImpl: Submitted application application_1618845744731_0002
21/04/19 15:27:07 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:07 INFO Client: 
	 client token: N/A
	 diagnostics: AM container is launched, waiting for AM container to Register with RM
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1618846026048
	 final status: UNDEFINED
	 tracking URL: http://ip-10-111-242-175.ec2.internal:20888/proxy/application_1618845744731_0002/
	 user: hadoop
21/04/19 15:27:08 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:09 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:10 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:11 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:12 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:13 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:14 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:15 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:16 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:17 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:18 INFO Client: Application report for application_1618845744731_0002 (state: FAILED)
21/04/19 15:27:18 INFO Client: 
	 client token: N/A
	 diagnostics: Application application_1618845744731_0002 failed 2 times due to AM Container for appattempt_1618845744731_0002_000002 exited with  exitCode: 13
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1618845744731_0002_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
	at org.apache.hadoop.util.Shell.run(Shell.java:869)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 13
For more detailed output, check the application tracking page: http://ip-10-111-242-175.ec2.internal:8088/cluster/app/application_1618845744731_0002 Then click on links to logs of each attempt.
. Failing the application.
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1618846026048
	 final status: FAILED
	 tracking URL: http://ip-10-111-242-175.ec2.internal:8088/cluster/app/application_1618845744731_0002
	 user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1618845744731_0002 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1165)
	at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1520)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/04/19 15:27:18 INFO ShutdownHookManager: Shutdown hook called
21/04/19 15:27:18 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-eaf4158e-151f-459b-8046-d1d5d06c8c1e
21/04/19 15:27:18 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-4823ff3e-ba33-471a-98cf-851cd1ff4c30
Command exiting with ret '1'

So, can someone please help us with this? We’re not sure where we’re going wrong. Thanks!

This thread suggests that it might be down to a bug in that EMR version, which was resolved in emr-5.21.0.

Thanks for the reply! We just tried with EMR 5.21.0, and we’re still getting errors. But they’re different to the ones we got in EMR 5.19.0. Attaching the error logs here -

21/04/20 04:12:54 WARN DependencyUtils: Skip remote jar s3://snowplow-hosted-assets/3-enrich/snowplow-event-recovery/snowplow-event-recovery-spark-0.1.0.jar.
21/04/20 04:12:55 INFO RMProxy: Connecting to ResourceManager at ip-10-111-242-185.ec2.internal/10.111.242.185:8032
21/04/20 04:12:55 INFO Client: Requesting a new application from cluster with 1 NodeManagers
21/04/20 04:12:55 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
21/04/20 04:12:55 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
21/04/20 04:12:55 INFO Client: Setting up container launch context for our AM
21/04/20 04:12:55 INFO Client: Setting up the launch environment for our AM container
21/04/20 04:12:55 INFO Client: Preparing resources for our AM container
21/04/20 04:12:55 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
21/04/20 04:12:58 INFO Client: Uploading resource file:/mnt/tmp/spark-f21b7886-6b2d-4fa4-a292-0898b53fa8b9/__spark_libs__6404691759432709612.zip -> hdfs://ip-10-111-242-185.ec2.internal:8020/user/hadoop/.sparkStaging/application_1618891474743_0002/__spark_libs__6404691759432709612.zip
21/04/20 04:13:01 INFO Client: Uploading resource s3://snowplow-hosted-assets/3-enrich/snowplow-event-recovery/snowplow-event-recovery-spark-0.1.0.jar -> hdfs://ip-10-111-242-185.ec2.internal:8020/user/hadoop/.sparkStaging/application_1618891474743_0002/snowplow-event-recovery-spark-0.1.0.jar
21/04/20 04:13:02 INFO S3NativeFileSystem: Opening 's3://snowplow-hosted-assets/3-enrich/snowplow-event-recovery/snowplow-event-recovery-spark-0.1.0.jar' for reading
21/04/20 04:13:05 INFO Client: Uploading resource file:/mnt/tmp/spark-f21b7886-6b2d-4fa4-a292-0898b53fa8b9/__spark_conf__566878594413093008.zip -> hdfs://ip-10-111-242-185.ec2.internal:8020/user/hadoop/.sparkStaging/application_1618891474743_0002/__spark_conf__.zip
21/04/20 04:13:05 INFO SecurityManager: Changing view acls to: hadoop
21/04/20 04:13:05 INFO SecurityManager: Changing modify acls to: hadoop
21/04/20 04:13:05 INFO SecurityManager: Changing view acls groups to: 
21/04/20 04:13:05 INFO SecurityManager: Changing modify acls groups to: 
21/04/20 04:13:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
21/04/20 04:13:07 INFO Client: Submitting application application_1618891474743_0002 to ResourceManager
21/04/20 04:13:07 INFO YarnClientImpl: Submitted application application_1618891474743_0002
21/04/20 04:13:08 INFO Client: Application report for application_1618891474743_0002 (state: ACCEPTED)
21/04/20 04:13:08 INFO Client: 
	 client token: N/A
	 diagnostics: AM container is launched, waiting for AM container to Register with RM
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1618891987545
	 final status: UNDEFINED
	 tracking URL: http://ip-10-111-242-185.ec2.internal:20888/proxy/application_1618891474743_0002/
	 user: hadoop
21/04/20 04:13:09 INFO Client: Application report for application_1618891474743_0002 (state: ACCEPTED)
21/04/20 04:13:10 INFO Client: Application report for application_1618891474743_0002 (state: ACCEPTED)
21/04/20 04:13:11 INFO Client: Application report for application_1618891474743_0002 (state: ACCEPTED)
21/04/20 04:13:12 INFO Client: Application report for application_1618891474743_0002 (state: ACCEPTED)
21/04/20 04:13:13 INFO Client: Application report for application_1618891474743_0002 (state: ACCEPTED)
21/04/20 04:13:14 INFO Client: Application report for application_1618891474743_0002 (state: ACCEPTED)
21/04/20 04:13:15 INFO Client: Application report for application_1618891474743_0002 (state: ACCEPTED)
21/04/20 04:13:16 INFO Client: Application report for application_1618891474743_0002 (state: ACCEPTED)
21/04/20 04:13:17 INFO Client: Application report for application_1618891474743_0002 (state: ACCEPTED)
21/04/20 04:13:18 INFO Client: Application report for application_1618891474743_0002 (state: ACCEPTED)
21/04/20 04:13:19 INFO Client: Application report for application_1618891474743_0002 (state: ACCEPTED)
21/04/20 04:13:20 INFO Client: Application report for application_1618891474743_0002 (state: ACCEPTED)
21/04/20 04:13:21 INFO Client: Application report for application_1618891474743_0002 (state: FAILED)
21/04/20 04:13:21 INFO Client: 
	 client token: N/A
	 diagnostics: Application application_1618891474743_0002 failed 2 times due to AM Container for appattempt_1618891474743_0002_000002 exited with  exitCode: 13
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1618891474743_0002_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
	at org.apache.hadoop.util.Shell.run(Shell.java:869)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 13
For more detailed output, check the application tracking page: http://ip-10-111-242-185.ec2.internal:8088/cluster/app/application_1618891474743_0002 Then click on links to logs of each attempt.
. Failing the application.
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1618891987545
	 final status: FAILED
	 tracking URL: http://ip-10-111-242-185.ec2.internal:8088/cluster/app/application_1618891474743_0002
	 user: hadoop
21/04/20 04:13:21 ERROR Client: Application diagnostics message: Application application_1618891474743_0002 failed 2 times due to AM Container for appattempt_1618891474743_0002_000002 exited with  exitCode: 13
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1618891474743_0002_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
	at org.apache.hadoop.util.Shell.run(Shell.java:869)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 13
For more detailed output, check the application tracking page: http://ip-10-111-242-185.ec2.internal:8088/cluster/app/application_1618891474743_0002 Then click on links to logs of each attempt.
. Failing the application.
Exception in thread "main" org.apache.spark.SparkException: Application application_1618891474743_0002 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1149)
	at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/04/20 04:13:21 INFO ShutdownHookManager: Shutdown hook called
21/04/20 04:13:21 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-f21b7886-6b2d-4fa4-a292-0898b53fa8b9
21/04/20 04:13:21 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-1b840687-c825-4bd6-9f8a-9cb999855c50
Command exiting with ret '1'