Hello friends,
I’m trying to setup Snowplow, got success in first steps, but now I got stuck in the process to move the enriched:good data from the HDFS to S3 bucket.
I’ve already read many similar posts and tried many suggested actions on my configs trying to pass this step, but none worked.
I’m using Scala Stream Collector as collector, I can see in console the events coming to my streams on Kinesis. I have setup an Enrichment (Scala Stream Enrich) that is running and moving datums.
I have a Kinesis S3 Sink running, here is the running loop output on my console:
Apr 06, 2017 3:24:34 PM com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker initialize
INFO: Syncing Kinesis shard info
Apr 06, 2017 3:24:34 PM com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker initialize
INFO: Starting LeaseCoordinator
Apr 06, 2017 3:24:44 PM com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker run
INFO: Initialization complete. Starting worker loop.
Apr 06, 2017 3:24:44 PM com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable publishMetrics
INFO: Successfully published 16 datums.
Apr 06, 2017 3:24:54 PM com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable publishMetrics
INFO: Successfully published 4 datums.
Apr 06, 2017 3:25:04 PM com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable publishMetrics
… and so many others loops.
My S3 bucket for ‘enriched:good’ is empty. So, reading the Pipeline process to solve step problems, when they say to clean the folder ‘enriched:good’ (Step 4) Batch Pipeline steps it’s already clean, and running the emr-etl-runner with the --skip staging, the process crash with the error:
Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-2985V5JBVCX83 failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
johnsnow: TERMINATING [STEP_FAILURE] ~ elapsed time n/a [2017-04-06 14:40:11 +0000 - ]
- 1. Elasticity Setup Hadoop Debugging: COMPLETED ~ 00:00:13 [2017-04-06 14:40:12 +0000 - 2017-04-06 14:40:25 +0000]
- 2. Elasticity S3DistCp Step: Raw S3 -> HDFS: COMPLETED ~ 00:04:48 [2017-04-06 14:40:27 +0000 - 2017-04-06 14:45:16 +0000]
- 3. Elasticity Scalding Step: Enrich Raw Events: COMPLETED ~ 00:02:00 [2017-04-06 14:45:25 +0000 - 2017-04-06 14:47:26 +0000]
**- 4. Elasticity S3DistCp Step: Enriched HDFS -> S3: FAILED ~ 00:00:14 [2017-04-06 14:47:28** +0000 - 2017-04-06 14:47:42 +0000]
- 5. Elasticity S3DistCp Step: Raw S3 Staging -> S3 Archive: CANCELLED ~ elapsed time n/a [ - ]
- 6. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
- 7. Elasticity Scalding Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]
- 8. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]):
uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:500:in `run'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:74:in `run'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in `<main>'
org/jruby/RubyKernel.java:973:in `load'
uri:classloader:/META-INF/main.rb:1:in `<main>'
org/jruby/RubyKernel.java:955:in `require'
uri:classloader:/META-INF/main.rb:1:in `(root)'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'
My config for emr-etl-runner is: https://pastebin.com/8CSb6zaU
Is there someone that solved this problem and could help me?