Hello,
So, we have some bad rows in the older format (0.1.0), and we’re trying to run the snowplow-event-recovery-spark-0.1.0.jar on EMR cluster, but we’re running into some issues. Just to see if it’s working, we’re trying to remove the “submitted” property from the body.
This is the config JSON:
{
"schema": "iglu:com.snowplowanalytics.snowplow/recoveries/jsonschema/1-0-0",
"data": [
{
"name": "RemoveFromBody",
"error": "error: object instance has properties which are not allowed by the schema: [\"submitted\"]\n level: \"error\"\n schema: {\"loadingURI\":\"#\",\"pointer\":\"\"}\n instance: {\"pointer\":\"\"}\n domain: \"validation\"\n keyword: \"additionalProperties\"\n unwanted: [\"submitted\"]\n",
"toRemove": "\"submitted\":\".*\",?"
}
]
}
This is the configuration of the steps we send to the EMR cluster:
{
"Type": "CUSTOM_JAR",
"Name": "Move to HDFS",
"ActionOnFailure": "CANCEL_AND_WAIT",
"Jar": "/usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar",
"Args": [
"--src",
"s3n://snowplow-recovery-poc-sample-bad-events/",
"--dest",
"hdfs:///local/to-recover/",
"--outputCodec",
"none"
]
},
{
"Type": "CUSTOM_JAR",
"Name": "snowplow-event-recovery-1_0_0",
"ActionOnFailure": "CANCEL_AND_WAIT",
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--class", "com.snowplowanalytics.snowplow.event.recovery.Main",
"--master", "yarn",
"--deploy-mode", "cluster",
"s3://snowplow-hosted-assets/3-enrich/snowplow-event-recovery/snowplow-event-recovery-spark-0.1.0.jar",
"--input", "hdfs:///local/to-recover/",
"--output", "hdfs:///local/recovered/",
"--config", "eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy9yZWNvdmVyaWVzL2pzb25zY2hlbWEvMS0wLTAiLCJkYXRhIjpbeyJuYW1lIjoiUmVtb3ZlRnJvbUJvZHkiLCJlcnJvciI6ImVycm9yOiBvYmplY3QgaW5zdGFuY2UgaGFzIHByb3BlcnRpZXMgd2hpY2ggYXJlIG5vdCBhbGxvd2VkIGJ5IHRoZSBzY2hlbWE6IFtcInN1Ym1pdHRlZFwiXVxuICAgIGxldmVsOiBcImVycm9yXCJcbiAgICBzY2hlbWE6IHtcImxvYWRpbmdVUklcIjpcIiNcIixcInBvaW50ZXJcIjpcIlwifVxuICAgIGluc3RhbmNlOiB7XCJwb2ludGVyXCI6XCJcIn1cbiAgICBkb21haW46IFwidmFsaWRhdGlvblwiXG4gICAga2V5d29yZDogXCJhZGRpdGlvbmFsUHJvcGVydGllc1wiXG4gICAgdW53YW50ZWQ6IFtcInN1Ym1pdHRlZFwiXVxuIiwidG9SZW1vdmUiOiJcInN1Ym1pdHRlZFwiOlwiLipcIiw/In1dfQ=="
]
},
{
"Type": "CUSTOM_JAR",
"Name": "Back to S3",
"ActionOnFailure": "CANCEL_AND_WAIT",
"Jar": "/usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar",
"Args": [
"--src",
"hdfs:///local/recovered/",
"--dest",
"s3n://snowplow-recovery-poc-sample-bad-events/recovered/"
]
}
]
When we’re running it on EMR Release version emr-5.19.0, with Spark 2.3.2 installed, we’re running into this error:
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /stderr (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:223)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:120)
at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:108)
at org.apache.spark.deploy.SparkSubmit$.initializeLogIfNecessary(SparkSubmit.scala:71)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:128)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
log4j:ERROR Either File or DatePattern options are not set for appender [DRFA-stderr].
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /stdout (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:223)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:120)
at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:108)
at org.apache.spark.deploy.SparkSubmit$.initializeLogIfNecessary(SparkSubmit.scala:71)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:128)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
log4j:ERROR Either File or DatePattern options are not set for appender [DRFA-stdout].
Warning: Skip remote jar s3://snowplow-hosted-assets/3-enrich/snowplow-event-recovery/snowplow-event-recovery-spark-0.1.0.jar.
21/04/19 14:05:26 INFO RMProxy: Connecting to ResourceManager at ip-10-111-242-182.ec2.internal/10.111.242.182:8032
21/04/19 14:05:26 INFO Client: Requesting a new application from cluster with 1 NodeManagers
21/04/19 14:05:27 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
21/04/19 14:05:27 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
21/04/19 14:05:27 INFO Client: Setting up container launch context for our AM
21/04/19 14:05:27 INFO Client: Setting up the launch environment for our AM container
21/04/19 14:05:27 INFO Client: Preparing resources for our AM container
21/04/19 14:05:28 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
21/04/19 14:05:30 INFO Client: Uploading resource file:/mnt/tmp/spark-7887e372-d819-4b2a-8d6b-cd9352791ee7/__spark_libs__1092377533137670526.zip -> hdfs://ip-10-111-242-182.ec2.internal:8020/user/hadoop/.sparkStaging/application_1618572004498_0007/__spark_libs__1092377533137670526.zip
21/04/19 14:05:34 INFO Client: Uploading resource s3://snowplow-hosted-assets/3-enrich/snowplow-event-recovery/snowplow-event-recovery-spark-0.1.0.jar -> hdfs://ip-10-111-242-182.ec2.internal:8020/user/hadoop/.sparkStaging/application_1618572004498_0007/snowplow-event-recovery-spark-0.1.0.jar
21/04/19 14:05:34 INFO S3NativeFileSystem: Opening 's3://snowplow-hosted-assets/3-enrich/snowplow-event-recovery/snowplow-event-recovery-spark-0.1.0.jar' for reading
21/04/19 14:05:37 INFO Client: Uploading resource file:/mnt/tmp/spark-7887e372-d819-4b2a-8d6b-cd9352791ee7/__spark_conf__8170958988937854101.zip -> hdfs://ip-10-111-242-182.ec2.internal:8020/user/hadoop/.sparkStaging/application_1618572004498_0007/__spark_conf__.zip
21/04/19 14:05:37 INFO SecurityManager: Changing view acls to: hadoop
21/04/19 14:05:37 INFO SecurityManager: Changing modify acls to: hadoop
21/04/19 14:05:37 INFO SecurityManager: Changing view acls groups to:
21/04/19 14:05:37 INFO SecurityManager: Changing modify acls groups to:
21/04/19 14:05:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
21/04/19 14:05:37 INFO Client: Submitting application application_1618572004498_0007 to ResourceManager
21/04/19 14:05:37 INFO YarnClientImpl: Submitted application application_1618572004498_0007
21/04/19 14:05:38 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:38 INFO Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1618841137594
final status: UNDEFINED
tracking URL: http://ip-10-111-242-182.ec2.internal:20888/proxy/application_1618572004498_0007/
user: hadoop
21/04/19 14:05:39 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:40 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:41 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:42 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:43 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:44 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:45 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:46 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:47 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:48 INFO Client: Application report for application_1618572004498_0007 (state: ACCEPTED)
21/04/19 14:05:49 INFO Client: Application report for application_1618572004498_0007 (state: FAILED)
21/04/19 14:05:49 INFO Client:
client token: N/A
diagnostics: Application application_1618572004498_0007 failed 2 times due to AM Container for appattempt_1618572004498_0007_000002 exited with exitCode: 13
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1618572004498_0007_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
at org.apache.hadoop.util.Shell.run(Shell.java:869)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 13
For more detailed output, check the application tracking page: http://ip-10-111-242-182.ec2.internal:8088/cluster/app/application_1618572004498_0007 Then click on links to logs of each attempt.
. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1618841137594
final status: FAILED
tracking URL: http://ip-10-111-242-182.ec2.internal:8088/cluster/app/application_1618572004498_0007
user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1618572004498_0007 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1165)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1520)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/04/19 14:05:49 INFO ShutdownHookManager: Shutdown hook called
21/04/19 14:05:49 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-764daf1b-a3e3-4f2e-adc2-ea7043df6395
21/04/19 14:05:49 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-7887e372-d819-4b2a-8d6b-cd9352791ee7
Command exiting with ret '1'
Upon googling, we thought it might be an issue with the EMR version.
So, we tried again with EMR Release label emr-5.17.0 with Spark 2.3.1. But, this time, we were getting a different error, and we couldn’t figure out what the actual issue was. Here are the error logs that we’re getting:
21/04/19 15:26:49 INFO RMProxy: Connecting to ResourceManager at ip-10-111-242-175.ec2.internal/10.111.242.175:8032
21/04/19 15:26:49 INFO Client: Requesting a new application from cluster with 1 NodeManagers
21/04/19 15:26:49 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
21/04/19 15:26:49 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
21/04/19 15:26:49 INFO Client: Setting up container launch context for our AM
21/04/19 15:26:49 INFO Client: Setting up the launch environment for our AM container
21/04/19 15:26:49 INFO Client: Preparing resources for our AM container
21/04/19 15:26:51 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
21/04/19 15:26:53 INFO Client: Uploading resource file:/mnt/tmp/spark-eaf4158e-151f-459b-8046-d1d5d06c8c1e/__spark_libs__1562645369368955787.zip -> hdfs://ip-10-111-242-175.ec2.internal:8020/user/hadoop/.sparkStaging/application_1618845744731_0002/__spark_libs__1562645369368955787.zip
21/04/19 15:26:57 INFO Client: Uploading resource s3://snowplow-hosted-assets/3-enrich/snowplow-event-recovery/snowplow-event-recovery-spark-0.1.0.jar -> hdfs://ip-10-111-242-175.ec2.internal:8020/user/hadoop/.sparkStaging/application_1618845744731_0002/snowplow-event-recovery-spark-0.1.0.jar
21/04/19 15:26:57 INFO S3NativeFileSystem: Opening 's3://snowplow-hosted-assets/3-enrich/snowplow-event-recovery/snowplow-event-recovery-spark-0.1.0.jar' for reading
21/04/19 15:27:05 INFO Client: Uploading resource file:/mnt/tmp/spark-eaf4158e-151f-459b-8046-d1d5d06c8c1e/__spark_conf__6487263774957674599.zip -> hdfs://ip-10-111-242-175.ec2.internal:8020/user/hadoop/.sparkStaging/application_1618845744731_0002/__spark_conf__.zip
21/04/19 15:27:06 INFO SecurityManager: Changing view acls to: hadoop
21/04/19 15:27:06 INFO SecurityManager: Changing modify acls to: hadoop
21/04/19 15:27:06 INFO SecurityManager: Changing view acls groups to:
21/04/19 15:27:06 INFO SecurityManager: Changing modify acls groups to:
21/04/19 15:27:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
21/04/19 15:27:06 INFO Client: Submitting application application_1618845744731_0002 to ResourceManager
21/04/19 15:27:06 INFO YarnClientImpl: Submitted application application_1618845744731_0002
21/04/19 15:27:07 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:07 INFO Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1618846026048
final status: UNDEFINED
tracking URL: http://ip-10-111-242-175.ec2.internal:20888/proxy/application_1618845744731_0002/
user: hadoop
21/04/19 15:27:08 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:09 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:10 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:11 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:12 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:13 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:14 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:15 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:16 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:17 INFO Client: Application report for application_1618845744731_0002 (state: ACCEPTED)
21/04/19 15:27:18 INFO Client: Application report for application_1618845744731_0002 (state: FAILED)
21/04/19 15:27:18 INFO Client:
client token: N/A
diagnostics: Application application_1618845744731_0002 failed 2 times due to AM Container for appattempt_1618845744731_0002_000002 exited with exitCode: 13
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1618845744731_0002_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
at org.apache.hadoop.util.Shell.run(Shell.java:869)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 13
For more detailed output, check the application tracking page: http://ip-10-111-242-175.ec2.internal:8088/cluster/app/application_1618845744731_0002 Then click on links to logs of each attempt.
. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1618846026048
final status: FAILED
tracking URL: http://ip-10-111-242-175.ec2.internal:8088/cluster/app/application_1618845744731_0002
user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1618845744731_0002 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1165)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1520)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/04/19 15:27:18 INFO ShutdownHookManager: Shutdown hook called
21/04/19 15:27:18 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-eaf4158e-151f-459b-8046-d1d5d06c8c1e
21/04/19 15:27:18 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-4823ff3e-ba33-471a-98cf-851cd1ff4c30
Command exiting with ret '1'
So, can someone please help us with this? We’re not sure where we’re going wrong. Thanks!