EmrEtlRunner ArgumentError (AWS EMR API Error (ValidationException)

I am receiving the following error while trying to run in AWS us-west-2:

./snowplow-emr-etl-runner --config snowplow-config.yml --resolver iglu_resolver.json
D, [2016-12-08T22:49:59.361000 #3628] DEBUG -- : Staging raw logs...
  moving files from s3://elasticbeanstalk-us-west-2-513226749779/resources/environments/logs/publish/e-ikjchvqq49/ to s3://production-snowplow/processing/
(t0)    MOVE elasticbeanstalk-us-west-2-513226749779/resources/environments/logs/publish/e-ikjchvqq49/i-5a79a3cf/var_log_tomcat8_rotated_localhost_access_log.2016-08-17-23.us-west-2.i-5a79a3cf.txt.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.gz -> production-snowplow/processing/var_log_tomcat8_rotated_localhost_access_log.2016-08-17-23.us-west-2.i-5a79a3cf.txt.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.gz
      +-> production-snowplow/processing/var_log_tomcat8_rotated_localhost_access_log.2016-08-17-23.us-west-2.i-5a79a3cf.txt.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.gz
      x elasticbeanstalk-us-west-2-513226749779/resources/environments/logs/publish/e-ikjchvqq49/i-5a79a3cf/var_log_tomcat8_rotated_localhost_access_log.2016-08-17-23.us-west-2.i-5a79a3cf.txt.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.gz
D, [2016-12-08T22:50:49.408000 #3628] DEBUG -- : Waiting a minute to allow S3 to settle (eventual consistency)
D, [2016-12-08T22:51:49.415000 #3628] DEBUG -- : Initializing EMR jobflow
F, [2016-12-08T22:51:51.192000 #3628] FATAL -- : 

**ArgumentError (AWS EMR API Error (ValidationException): The supplied bootstrap action(s): 'Elasticity Bootstrap Action' are not supported by release 'emr-4.5.0'.):**
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/elasticity-6.0.7/lib/elasticity/aws_session.rb:33:in `submit'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/elasticity-6.0.7/lib/elasticity/emr.rb:302:in `run_job_flow'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/elasticity-6.0.7/lib/elasticity/job_flow.rb:151:in `run'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:445:in `run'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/method_reference.rb:46:in `send_to'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:305:in `call_with'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:in `common_method_added'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:68:in `run'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/method_reference.rb:46:in `send_to'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:305:in `call_with'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:in `common_method_added'
    file:/var/lib/jenkins/ops/snowplow-emr-etl-runner!/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in `(root)'
    org/jruby/RubyKernel.java:1091:in `load'
    file:/var/lib/jenkins/ops/snowplow-emr-etl-runner!/META-INF/main.rb:1:in `(root)'
    org/jruby/RubyKernel.java:1072:in `require'
    file:/var/lib/jenkins/ops/snowplow-emr-etl-runner!/META-INF/main.rb:1:in `(root)'
    /tmp/jruby6178060773063677939extract/jruby-stdlib-!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in `(root)'

I am using the following config:

      # Credentials can be hardcoded or set in environment variables
      access_key_id: <%= ENV['AWS_SNOWPLOW_ACCESS_KEY'] %>
      secret_access_key: <%= ENV['AWS_SNOWPLOW_SECRET_KEY'] %>
        region: "us-west-2"
          assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
          jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
          log: s3://production-snowplow/logs
            in:                  # Multiple in buckets are permitted
              - s3://elasticbeanstalk-us-west-2-513226749779/resources/environments/logs/publish/e-ikjchvqq49         # e.g. s3://my-in-bucket
            processing: s3://production-snowplow/processing
            archive: s3://production-snowplow/archive    # e.g. s3://my-archive-bucket/raw
            good: s3://production-snowplow/enriched/good       # e.g. s3://my-out-bucket/enriched/good
            bad: s3://production-snowplow/enriched/bad       # e.g. s3://my-out-bucket/enriched/bad
            errors:      # Leave blank unless :continue_on_unexpected_error: set to true below
            archive: s3://production-snowplow/enriched   # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
            good: s3://production-snowplow/shredded/good        # e.g. s3://my-out-bucket/shredded/good
            bad: s3://production-snowplow/shredded/bad        # e.g. s3://my-out-bucket/shredded/bad
            errors: ADD HERE     # Leave blank unless :continue_on_unexpected_error: set to true below
            archive: s3://production-snowplow/shredded   # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
        ami_version: 4.5.0
        region: "us-west-2"       # Always set this
        jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
        service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles
        placement: "us-west-2a"     # Set this if not running in VPC. Leave blank otherwise
        ec2_subnet_id:  # Set this if running in VPC. Leave blank otherwise
        ec2_key_name: opskey.pem
        bootstrap:         # Set this to specify custom boostrap actions. Leave empty otherwise
          hbase: "0.92.0"            # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.
          lingual: "1.1"             # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.
        # Adjust your Hadoop cluster below
          master_instance_type: m1.medium
          core_instance_count: 2
          core_instance_type: m1.medium
          task_instance_count: 0 # Increase to use spot instances
          task_instance_type: m1.medium
          task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
        bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
        additional_info:        # Optional JSON string for selecting additional features
      format: clj-tomcat # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Connect events
      job_name: Snowplow ETL # Give your job a name
        hadoop_enrich: 1.8.0 # Version of the Hadoop Enrichment process
        hadoop_shred: 0.9.0 # Version of the Hadoop Shredding process
        hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process
      continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL
      output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP
        folder: # Postgres-only config option. Where to store the downloaded files. Leave blank for Redshift
        - name: "Elasticsearch"
          type: elasticsearch
          host: ec2-54-189-166-116.us-west-2.compute.amazonaws.com # The Elasticsearch endpoint
          database: snowplow # Name of index
          port: 9200 # Default Elasticsearch port - change to 80 if using Amazon Elasticsearch Service
          sources: # Leave blank to write the bad rows created in this run to Elasticsearch, or explicitly provide an array of bad row buckets like ["s3://my-enriched-bucket/bad/run=2015-10-06-15-25-53"]
          ssl_mode: # Not required for Elasticsearch
          table: ADD HERE # Name of type
          username: # Not required for Elasticsearch
          password: # Not required for Elasticsearch
          es_nodes_wan_only: false # Set to true if using Amazon Elasticsearch Service
          maxerror: # Not required for Elasticsearch
          comprows: # Not required for Elasticsearch
      tags: {} # Name-value pairs describing this job
        level: DEBUG # You can optionally switch to INFO for production


Hi @jlmoody - what version of EmrEtlRunner are you using?

I am using 77 which is the one linked in the documentation. I have discovered the link to additional versions in the documentation so I am going to try the latest release now.

Hi @jlmoody - could you share the documentation link you are referencing, so we can fix it? R77 is slightly old now.

3. Installation

  1. Installation

We host EmrEtlRunner on the distribution platform JFrog Bintray. You can get a copy of it as shown below.

Note: Please, follow this link if you wish to get a different version of the runner. The distribution name follows the pattern snowplow_emr_{{RELEASE_VERSION}}.zip.

$ wget http://dl.bintray.com/snowplow/snowplow-generic/snowplow_emr_r77_great_auk.zip
The archive contains both EmrEtlRunner and StorageLoader. Unzip the archive:

$ unzip snowplow_emr_r77_great_auk.zip
You will see two files snowplow-emr-etl-runner and snowplow-storage-loader where the first one is the actual EmrEtlRunner.

Thanks ticket added:


Hello, I’m trying to use EmrEtlRunner r87 and am experiencing this same error. Any idea why that would be?

Hi Bryce,
My issue boiled down to being a config issue. My storage target is
Elasticsearch and so I needed to remove the other options under storage
targets. Hope that helps. If not, I would be happy to take a look at your
config file.

Thanks, Jason! I figured out how to make this go away by removing the HBase and Lingual version numbers from config; I don’t probably need them currently I guess. Thank you for your reply and offer to help.