I made the good bucket empty and re run the EmrEtl runner.
If the run the process without --skip staging all the steps of EmrEtlrunner will complete but in the enriched/good/ the run_folder will of 0KB and in shredded/good/ the run_folder is creating successfully
If i tried with --skip staging i am getting the below error in 3rd step of the process.
Below is command used for runninng EMRETLrunner.
./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/3-enrich/config/iglu_resolver.json --enrichments snowplow/3-enrich/config/enrichments/ --skip staging
I am getting below error in the ubuntu CLI :
D, [2017-10-16T15:07:45.058000 #2642] DEBUG – : Initializing EMR jobflow
D, [2017-10-16T15:07:48.859000 #2642] DEBUG – : EMR jobflow j-3I1KJNJKU8JV2 started, waiting for jobflow to complete…
F, [2017-10-16T15:17:54.079000 #2642] FATAL – :
Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-3I1KJNJKU8JV2 failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL: TERMINATING [STEP_FAILURE] ~ elapsed time n/a [2017-10-16 15:12:59 +0000 - ]
- 1. Elasticity S3DistCp Step: Raw S3 -> Raw HDFS: COMPLETED ~ 00:02:38 [2017-10-16 15:13:01 +0000 - 2017-10-16 15:15:39 +0000]
- 2. Elasticity Spark Step: Enrich Raw Events: COMPLETED ~ 00:01:04 [2017-10-16 15:15:41 +0000 - 2017-10-16 15:16:45 +0000]
- 3. Elasticity S3DistCp Step: Enriched HDFS -> S3: FAILED ~ 00:00:06 [2017-10-16 15:16:47 +0000 - 2017-10-16 15:16:53 +0000]
- 4. Elasticity S3DistCp Step: Shredded S3 -> Shredded Archive S3: CANCELLED ~ elapsed time n/a [ - ]
- 5. Elasticity S3DistCp Step: Enriched S3 -> Enriched Archive S3: CANCELLED ~ elapsed time n/a [ - ]
- 6. Elasticity S3DistCp Step: Raw Staging S3 -> Raw Archive S3: CANCELLED ~ elapsed time n/a [ - ]
- 7. Elasticity S3DistCp Step: Shredded HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
- 8. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
- 9. Elasticity Spark Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]
- 10. Elasticity Custom Jar Step: Empty Raw HDFS: CANCELLED ~ elapsed time n/a [ - ]
- 11. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]):
uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:586:in `run'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:103:in `run'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
org/jruby/RubyKernel.java:979:in `load'
uri:classloader:/META-INF/main.rb:1:in `<main>'
org/jruby/RubyKernel.java:961:in `require'
uri:classloader:/META-INF/main.rb:1:in `(root)'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'
In the logs when i checekd the stderr.gz file for the particular step below is the error:
17/10/16 15:15:44 INFO RMProxy: Connecting to ResourceManager at ip-172-31-7-95.ec2.internal/172.31.7.95:8032
17/10/16 15:15:45 INFO Client: Requesting a new application from cluster with 2 NodeManagers
17/10/16 15:15:45 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (61440 MB per container)
17/10/16 15:15:45 INFO Client: Will allocate AM container, with 61440 MB memory including 5585 MB overhead
17/10/16 15:15:45 INFO Client: Setting up container launch context for our AM
17/10/16 15:15:45 INFO Client: Setting up the launch environment for our AM container
17/10/16 15:15:45 INFO Client: Preparing resources for our AM container
17/10/16 15:15:46 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
17/10/16 15:15:49 INFO Client: Uploading resource file:/mnt/tmp/spark-9dc8c996-c571-417e-9e58-aba2fb8b306a/__spark_libs__1073128864940540338.zip -> hdfs://ip-172-31-7-95.ec2.internal:8020/user/hadoop/.sparkStaging/application_1508166661063_0002/__spark_libs__1073128864940540338.zip
17/10/16 15:15:54 INFO Client: Uploading resource s3://snowplow-hosted-assets-us-east-1/3-enrich/spark-enrich/snowplow-spark-enrich-1.9.0.jar -> hdfs://ip-172-31-7-95.ec2.internal:8020/user/hadoop/.sparkStaging/application_1508166661063_0002/snowplow-spark-enrich-1.9.0.jar
17/10/16 15:15:54 INFO S3NativeFileSystem: Opening 's3://snowplow-hosted-assets-us-east-1/3-enrich/spark-enrich/snowplow-spark-enrich-1.9.0.jar' for reading
17/10/16 15:15:56 INFO Client: Uploading resource file:/mnt/tmp/spark-9dc8c996-c571-417e-9e58-aba2fb8b306a/__spark_conf__5417205837143737464.zip -> hdfs://ip-172-31-7-95.ec2.internal:8020/user/hadoop/.sparkStaging/application_1508166661063_0002/__spark_conf__.zip
17/10/16 15:15:56 INFO SecurityManager: Changing view acls to: hadoop
17/10/16 15:15:56 INFO SecurityManager: Changing modify acls to: hadoop
17/10/16 15:15:56 INFO SecurityManager: Changing view acls groups to:
17/10/16 15:15:56 INFO SecurityManager: Changing modify acls groups to:
17/10/16 15:15:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
17/10/16 15:15:56 INFO Client: Submitting application application_1508166661063_0002 to ResourceManager
17/10/16 15:15:57 INFO YarnClientImpl: Submitted application application_1508166661063_0002
17/10/16 15:15:58 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:15:58 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1508166956994
final status: UNDEFINED
tracking URL: http://ip-172-31-7-95.ec2.internal:20888/proxy/application_1508166661063_0002/
user: hadoop
17/10/16 15:15:59 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:16:00 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:16:01 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:16:02 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:16:03 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:16:04 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:16:05 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:16:06 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:06 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.31.12.106
ApplicationMaster RPC port: 0
queue: default
start time: 1508166956994
final status: UNDEFINED
tracking URL: http://ip-172-31-7-95.ec2.internal:20888/proxy/application_1508166661063_0002/
user: hadoop
17/10/16 15:16:07 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:08 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:09 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:10 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:11 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:12 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:13 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:14 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:15 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:16 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:17 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:18 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:19 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:20 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:21 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:22 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:23 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:24 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:25 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:26 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:27 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:28 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:29 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:30 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:31 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:32 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:33 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:34 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:35 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:36 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:37 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:38 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:39 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:40 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:41 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:42 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:43 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:44 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:45 INFO Client: Application report for application_1508166661063_0002 (state: FINISHED)
17/10/16 15:16:45 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.31.12.106
ApplicationMaster RPC port: 0
queue: default
start time: 1508166956994
final status: SUCCEEDED
tracking URL: http://ip-172-31-7-95.ec2.internal:20888/proxy/application_1508166661063_0002/
user: hadoop
17/10/16 15:16:45 INFO ShutdownHookManager: Shutdown hook called
17/10/16 15:16:45 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-9dc8c996-c571-417e-9e58-aba2fb8b306a
Command exiting with ret '0'
Please help me…