Hi there,
I am trying to setup the EmrEtlRunner process to run, however, it always fails in the S3DistCp Enriched HDFS -> S3 step.
I’ve checked previous posts on the same error - but no solution for my case.
The topology is the Lambda one:
Trackers (JS, PHP, Pixel) -> Collector (Scala Stream) ->
Kinesis (streams good, bad) -> Scala Stream Enrich -> Kinesis S3 (enriched) (THIS WORKS)
Kinesis (streams good, bad) -> Kinesis S3 (raw) -> EmrEtlRunner (THIS FAILS)
I’ve tried running with --skip staging as well. I’ve added the log level DEBUG but it didn’t add anything new.
I’ve checked the name of all Kinesis streams, S3 buckets, and nothing is incorrect.
Any other ideas?
Exception in thread "main" java.lang.RuntimeException: Error running job
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:927)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:705)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-172-31-39-89.us-west-2.compute.internal:8020/tmp/cdf15f73-76c7-40d5-a6cc-861d10048635/files
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:317)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:352)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:901)
... 10 more
2017-09-25 04:19:37,296 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Running with args: --src hdfs:///local/snowplow/enriched-events/ --dest s3://XXXXX-events-enriched/good/run=2017-09-25-04-06-52/ --srcPattern .*part-.* --s3Endpoint s3-us-west-2.amazonaws.com
2017-09-25 04:19:38,345 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): S3DistCp args: --src hdfs:///local/snowplow/enriched-events/ --dest s3://XXXXX-events-enriched/good/run=2017-09-25-04-06-52/ --srcPattern .*part-.* --s3Endpoint s3-us-west-2.amazonaws.com
2017-09-25 04:19:38,421 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Using output path 'hdfs:/tmp/cdf15f73-76c7-40d5-a6cc-861d10048635/output'
2017-09-25 04:19:41,233 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Created 0 files to copy 0 files
2017-09-25 04:19:50,810 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Reducer number: 3
2017-09-25 04:19:51,066 INFO org.apache.hadoop.yarn.client.RMProxy (main): Connecting to ResourceManager at ip-172-31-39-89.us-west-2.compute.internal/172.31.39.89:8032
2017-09-25 04:19:52,511 INFO org.apache.hadoop.mapreduce.JobSubmitter (main): Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1506312682959_0004
2017-09-25 04:19:52,519 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Try to recursively delete hdfs:/tmp/cdf15f73-76c7-40d5-a6cc-861d10048635/tempspace
The log from the Enrich task:
17/09/25 04:17:46 INFO RMProxy: Connecting to ResourceManager at ip-172-31-39-89.us-west-2.compute.internal/172.31.39.89:8032
17/09/25 04:17:46 INFO Client: Requesting a new application from cluster with 2 NodeManagers
17/09/25 04:17:47 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (2048 MB per container)
17/09/25 04:17:47 INFO Client: Will allocate AM container, with 2048 MB memory including 384 MB overhead
17/09/25 04:17:47 INFO Client: Setting up container launch context for our AM
17/09/25 04:17:47 INFO Client: Setting up the launch environment for our AM container
17/09/25 04:17:47 INFO Client: Preparing resources for our AM container
17/09/25 04:17:51 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
17/09/25 04:17:57 INFO Client: Uploading resource file:/mnt/tmp/spark-6a804fb0-92e6-4b13-b1f5-8c422e1c6d25/__spark_libs__1150425498665415086.zip -> hdfs://ip-172-31-39-89.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1506312682959_0003/__spark_libs__1150425498665415086.zip
17/09/25 04:18:12 INFO Client: Uploading resource s3://snowplow-hosted-assets-us-west-2/3-enrich/spark-enrich/snowplow-spark-enrich-1.9.0.jar -> hdfs://ip-172-31-39-89.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1506312682959_0003/snowplow-spark-enrich-1.9.0.jar
17/09/25 04:18:12 INFO S3NativeFileSystem: Opening 's3://snowplow-hosted-assets-us-west-2/3-enrich/spark-enrich/snowplow-spark-enrich-1.9.0.jar' for reading
17/09/25 04:18:18 INFO Client: Uploading resource file:/mnt/tmp/spark-6a804fb0-92e6-4b13-b1f5-8c422e1c6d25/__spark_conf__9044663379866809420.zip -> hdfs://ip-172-31-39-89.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1506312682959_0003/__spark_conf__.zip
17/09/25 04:18:18 INFO SecurityManager: Changing view acls to: hadoop
17/09/25 04:18:18 INFO SecurityManager: Changing modify acls to: hadoop
17/09/25 04:18:18 INFO SecurityManager: Changing view acls groups to:
17/09/25 04:18:18 INFO SecurityManager: Changing modify acls groups to:
17/09/25 04:18:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
17/09/25 04:18:18 INFO Client: Submitting application application_1506312682959_0003 to ResourceManager
17/09/25 04:18:18 INFO YarnClientImpl: Submitted application application_1506312682959_0003
17/09/25 04:18:19 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:19 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1506313098722
final status: UNDEFINED
tracking URL: http://ip-172-31-39-89.us-west-2.compute.internal:20888/proxy/application_1506312682959_0003/
user: hadoop
17/09/25 04:18:20 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:21 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:22 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:23 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:24 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:25 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:26 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:27 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:28 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:29 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:30 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:31 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:32 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:33 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:34 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:35 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:36 INFO Client: Application report for application_1506312682959_0003 (state: ACCEPTED)
17/09/25 04:18:37 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:37 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.31.38.225
ApplicationMaster RPC port: 0
queue: default
start time: 1506313098722
final status: UNDEFINED
tracking URL: http://ip-172-31-39-89.us-west-2.compute.internal:20888/proxy/application_1506312682959_0003/
user: hadoop
17/09/25 04:18:38 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:39 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:40 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:41 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:42 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:43 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:44 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:45 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:46 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:47 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:48 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:49 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:50 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:51 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:52 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:53 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:54 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:55 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:56 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:57 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:58 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:18:59 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:00 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:01 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:02 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:03 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:04 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:05 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:06 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:07 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:08 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:09 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:11 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:12 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:13 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:14 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:15 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:16 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:17 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:18 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:19 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:20 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:21 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:22 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:23 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:24 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:25 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:26 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:27 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:28 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:29 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:30 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:31 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:32 INFO Client: Application report for application_1506312682959_0003 (state: RUNNING)
17/09/25 04:19:33 INFO Client: Application report for application_1506312682959_0003 (state: FINISHED)
17/09/25 04:19:33 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.31.38.225
ApplicationMaster RPC port: 0
queue: default
start time: 1506313098722
final status: SUCCEEDED
tracking URL: http://ip-172-31-39-89.us-west-2.compute.internal:20888/proxy/application_1506312682959_0003/
user: hadoop
17/09/25 04:19:33 INFO ShutdownHookManager: Shutdown hook called
17/09/25 04:19:33 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-6a804fb0-92e6-4b13-b1f5-8c422e1c6d25
Command exiting with ret '0'