Strange Random Error on EmtEtlRunner

kjain · September 21, 2017, 1:16pm

OK…checking the logs of the FIRST time this happened, I see more detailed logs of the error with Excon:

Excon::Error::ServiceUnavailable (Expected(200) <=> Actual(503 Service Unavailable)
excon.error.response
 :body          => "<Error><Code>SlowDown</Code><Message>Please reduce your request rate.</Message><RequestId>3F7E4F0929844761</RequestId><HostId>r4/0CYAG09ZaxxiSyXZJhAAEWGygJzwMGY30TPFj+pH4+LeIbkfXZKDTtsfxwbcpbS4PHu7DaP8=</HostId></Error>"
 :cookies       => [
]
:headers       => {
"Connection"       => "close"
"Content-Type"     => "application/xml"
"Date"             => "Thu, 14 Sep 2017 19:05:43 GMT"
"Server"           => "AmazonS3"
"x-amz-id-2"       => "r4/0CYAG09ZaxxiSyXZJhAAEWGygJzwMGY30TPFj+pH4+LeIbkfXZKDTtsfxwbcpbS4PHu7DaP8="
"x-amz-request-id" => "3F7E4F0929844761"
}
:host          => "in-stats-prod-archive.s3.amazonaws.com"
:local_address => "<internal-ip-of-emr-runner>"
:local_port    => 44096
:path          => "/raw/2017-09-14-49573367394269636484214968494006254435221012695805329410-49573367394269636484214968517991342696375263768313069570.lzo.index"
:port          => 443
:reason_phrase => "Slow Down"
:remote_ip     => "52.216.230.139"
:status        => 503
:status_line   => "HTTP/1.1 503 Slow Down\r\n"
):
uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/expects.rb:7:in `response_call'
uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/response_parser.rb:9:in `response_call'
uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:388:in `response'
uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:252:in `request'
uri:classloader:/gems/fog-xml-0.1.2/lib/fog/xml/sax_parser_connection.rb:35:in `request'
uri:classloader:/gems/fog-xml-0.1.2/lib/fog/xml/connection.rb:7:in `request'
uri:classloader:/gems/fog-1.25.0/lib/fog/aws/storage.rb:521:in `_request'
uri:classloader:/gems/fog-1.25.0/lib/fog/aws/storage.rb:516:in `request'
uri:classloader:/gems/fog-1.25.0/lib/fog/aws/requests/storage/copy_object.rb:32:in `copy_object'
uri:classloader:/gems/fog-1.25.0/lib/fog/aws/models/storage/file.rb:92:in `copy'
uri:classloader:/gems/sluice-0.4.0/lib/sluice/storage/s3/s3.rb:622:in `block in retry_x'
org/jruby/ext/timeout/Timeout.java:117:in `timeout'
uri:classloader:/gems/sluice-0.4.0/lib/sluice/storage/s3/s3.rb:621:in `retry_x'
uri:classloader:/gems/sluice-0.4.0/lib/sluice/storage/s3/s3.rb:548:in `block in process_files'
org/jruby/RubyKernel.java:1295:in `loop'
uri:classloader:/gems/sluice-0.4.0/lib/sluice/storage/s3/s3.rb:412:in `block in process_files'

Problem destroying processing/2017-09-14-49573367394269636484214968352265746714503765354478567426-49573367394269636484214968375920798226903222662291521538.lzo.index. Retrying.
Problem destroying processing/2017-09-14-49573367394269636484214968518667132229539841820571205634-49573367394269636484214968543292951175089846286953742338.lzo. Retrying.
Error running EmrEtlRunner, exiting with return code 1. StorageLoader not run

So it does seem related to the aws setup in some way? Hope this helps understand the issue…

The EMRETLRunner version I am currently using is: r83 (bald eagle)
http://dl.bintray.com/snowplow/snowplow-generic/snowplow_emr_r83_bald_eagle.zip

I had tried to update the version earlier but that did result in breaking or conflicts hence I kept this version. The platform has been stable for the past 5-6 months until this happened.

UPDATE: I found that this error ha been logged in the the forums earlier as well. That error stemmed from the buckets and the EMR runner being in different regions; but I can confirm that all my buckets, rdb and runners are in the same region. Again, it seems to happen mostly when moving large amounts of data in step 10 of the pipeline (moving from processing to raw:archive). Assuming I can temporarily resume manually, what would be the command I would need to run? As per my above message, I tried it manually but no data got loaded into the rdb!

UPDATE 2: I found that many users of S3 have been seeing this message during large transfers, especially on that day (9/14). Here’s one forum describing it. Maybe it was a temporarily glitch? If so, how can I resume from where the runner left off?

Thanks again!

Topic		Replies	Views
Emretlrunner executionerror data files not archived AWS batch pipeline (Legacy)	3	1459	September 27, 2017
Error while Running EmrEtlRunner AWS batch pipeline (Legacy)	19	2389	September 22, 2017
EmrEtlRunner::EmrExecutionError in the 3rd stage of the process AWS batch pipeline (Legacy)	4	2298	October 23, 2017
Snowplow ETL: TERMINATED_WITH_ERRORS [STEP_FAILURE AWS batch pipeline (Legacy)	5	1467	October 4, 2017
EMRETLrunner error: Excon::Error::MovedPermanently (Expected(200) <=> Actual(301 Moved Permanently) For engineers	5	1337	May 26, 2017

Strange Random Error on EmtEtlRunner

Related topics