I’m trying to figure out why the EMR cluster won’t start in my VPC in eu-central-1. I’ve tried ami version 4.5.0, 4.8.2, 4.1.0, 5.0.3 and a few others. Tried hadoop_enrich: 1.8.0, 1.7.0, 1.9.0 to no avail. Every time it fails on the same step ‘Elasticity Setup Hadoop Debugging’ while taking 0 seconds. stdout,stderr and syslog are not logged to s3, controller log shows only this:
2016-10-28T11:16:26.161Z INFO Ensure step 1 jar file s3://elasticmapreduce/libs/script-runner/script-runner.jar
the node logs show nothing of interest it seems. Any ideas?
Here’s my config.yml:
job_name: snowplow ETL
Okay, I found out this is an issue with eu-central-1. Documentation is incredibly vague on this…
I got past the first step when trying to run EMR in Ireland, but my s3 bucket region is eu-central-1, so I guess it can’t communicate with it since I got an exception on step2 (Elasticity Scalding Step: Enrich Raw Events):
Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 77075F391C52BF29), S3 Extended Request ID:
Hi @kazgurs - can you share the command-line arguments you are supplying to EmrEtlRunner?
./snowplow-emr-etl-runner --debug --config config/config.yml --resolver config/resolver.json --enrichments config/enrichments --skip staging
Hi @kazgurs - ah, there is a known issue with --debug in eu-central-1.
Please try again with:
$ ./snowplow-emr-etl-runner --config config/config.yml --resolver config/resolver.json \
--enrichments config/enrichments --skip staging
Thanks, it worked fine without the debug option, although the output was still printing debug lines.
--debug refers to enabling in-EMR debugging behavior. Debug lines from EmrEtlRunner are controlled by the
monitoring.logging.level in config.yml.