EMR cluster step fails


I’m trying to figure out why the EMR cluster won’t start in my VPC in eu-central-1. I’ve tried ami version 4.5.0, 4.8.2, 4.1.0, 5.0.3 and a few others. Tried hadoop_enrich: 1.8.0, 1.7.0, 1.9.0 to no avail. Every time it fails on the same step ‘Elasticity Setup Hadoop Debugging’ while taking 0 seconds. stdout,stderr and syslog are not logged to s3, controller log shows only this:

2016-10-28T11:16:26.161Z INFO Ensure step 1 jar file s3://elasticmapreduce/libs/script-runner/script-runner.jar

the node logs show nothing of interest it seems. Any ideas?

Here’s my config.yml:

  ami_version: 4.5.0
  region: eu-central-1
  jobflow_role: EMR_EC2_DefaultRole
  service_role: EMR_DefaultRole
  ec2_subnet_id: subnet-9h78lyf3
  ec2_key_name: name
  bootstrap: []
  master_instance_type: c4.large
  core_instance_count: 2
  core_instance_type: c4.xlarge
  task_instance_count: 0
  task_instance_type: c4.xlarge
bootstrap_failure_tries: 3
  format: clj-tomcat
  job_name: snowplow ETL
hadoop_enrich: 1.8.0
hadoop_shred: 0.9.0
hadoop_elasticsearch: 0.1.0
  continue_on_unexpected_error: true
  output_compression: GZIP

Okay, I found out this is an issue with eu-central-1. Documentation is incredibly vague on this…
I got past the first step when trying to run EMR in Ireland, but my s3 bucket region is eu-central-1, so I guess it can’t communicate with it since I got an exception on step2 (Elasticity Scalding Step: Enrich Raw Events):

Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon    S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 77075F391C52BF29), S3 Extended Request ID:

Hi @kazgurs - can you share the command-line arguments you are supplying to EmrEtlRunner?

Hi @alex,

./snowplow-emr-etl-runner --debug --config config/config.yml --resolver config/resolver.json --enrichments config/enrichments --skip staging

Hi @kazgurs - ah, there is a known issue with --debug in eu-central-1.

Please try again with:

$ ./snowplow-emr-etl-runner --config config/config.yml --resolver config/resolver.json \
--enrichments config/enrichments --skip staging

Thanks, it worked fine without the debug option, although the output was still printing debug lines.

Yes - --debug refers to enabling in-EMR debugging behavior. Debug lines from EmrEtlRunner are controlled by the monitoring.logging.level in config.yml.