Running the emr-etl-runner throws the following error :
D, [2016-06-24T00:03:54.356000 #3250] DEBUG -- : Staging raw logs...
F, [2016-06-24T00:03:57.068000 #3250] FATAL -- :
Excon::Errors::Forbidden (Expected(200) <=> Actual(403 Forbidden)
excon.error.response
:body => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message><AWSAccessKeyId>XXXXXXX</AWSAccessKeyId><RequestId>4BA98CF755881B32</RequestId><HostId>hCyHyLOC1LpzLB7bDsk/X34ZDkI3lsGcuSTV5r/ZvrhWSuxqxkF5W6Kt0R3RTNJaXpaUD462+Wc=</HostId></Error>"
/home/ubuntu/downloads/snowplow-emr-etl-runner!/gems/excon-0.45.3/lib/excon/middlewares/expects.rb:6:in `response_call'
/home/ubuntu/downloads/snowplow-emr-etl-runner!/gems/excon-0.45.3/lib/excon/middlewares/response_parser.rb:8:in `response_call'
So it’s a fact this is a permission issue. I’ve created all s3 buckets with the same IAM role used to fire up my ec2 instance on which I’m testing the emr-etl-runner. What am I missing ?
Here is my config.file
aws:
access_key_id:xxxxxx # Credentials can be hardcoded or set in environment variables
secret_access_key: xxxxxx
s3:
region: us-east-1
buckets:
assets: s3://snowplow-hosted-assets
jsonpath_assets:
log: s3://a1-snowplow-jars/emretl-runner/logs/
raw:
in:
- s3://a1-snowplow-jars/emretl-runner/in/
processing: s3://a1-snowplow-jars/emretl-runner/processing/
archive: s3://a1-snowplow-jars/emretl-runner/raw
enriched:
good: s3://a1-snowplow-jars/emretl-runner/enriched/good/
bad: s3://a1-snowplow-jars/emretl-runner/enriched/bad/
errors: s3://a1-snowplow-jars/emretl-runner/enriched/errors/
archive: s3://a1-snowplow-jars/emretl-runner/enriched/archive/
shredded:
good: s3://a1-snowplow-jars/emretl-runner/shredded/good/
bad: s3://a1-snowplow-jars/emretl-runner/shredded/bad/
errors: s3://a1-snowplow-jars/emretl-runner/shredded/errors/
archive: s3://a1-snowplow-jars/emretl-runner/shredded/archive/
emr:
ami_version: 4.5.0
region: us-east-1
jobflow_role: ecsInstanceRoleSnowPlow
service_role: ecsInstanceRoleSnowPlow
placement:
ec2_subnet_id: subnet-90a1c2bb
ec2_key_name: sai-kats-box
bootstrap: []
software:
hbase:
lingual:
jobflow:
master_instance_type: m1.medium
core_instance_count: 2
core_instance_type: m1.medium
task_instance_count: 0
task_instance_type: m1.medium
task_instance_bid: 0.015
additional_info:
bootstrap_failure_tries: 3
format: clj-tomcat
enrich:
job_name: potomac_snowplow_etl
hadoop_enrich: 1.7.0
hadoop_shred: 0.9.0
#hadoop_elasticsearch: 0.1.0
continue_on_unexpected_error: true
output_compression: NONE
storage:
download:
folder:
targets: []
monitoring:
tags: {}
logging:
level: DEBUG
snowplow:
method: get
app_id: snowplow
collector: d2bpvzh93js6np.cloudfront.net