Didn’t want to hijack Phil’s thread , but I believe having the same issue from ap-southeast-2. Would anyone be able to confirm?
Unexpected error: Expected(200) <=> Actual(403 Forbidden)
excon.error.response
:body => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>78D05CD27585AF4B</RequestId><HostId>xASPF1IGxUzvOTmNF/YNJlztuI57my2/WHnNuZM+jpWBQKrUsbmDeRl19ssp8rDL79nosEfyKrc=</HostId></Error>"
:cookies => [
]
:headers => {
"Content-Type" => "application/xml"
"Date" => "Wed, 13 Sep 2017 03:31:55 GMT"
"Server" => "AmazonS3"
"x-amz-bucket-region" => "ap-southeast-2"
"x-amz-id-2" => "xASPF1IGxUzvOTmNF/YNJlztuI57my2/WHnNuZM+jpWBQKrUsbmDeRl19ssp8rDL79nosEfyKrc="
"x-amz-request-id" => "78D05CD27585AF4B"
}
:host => "snowplow-hosted-assets-ap-southeast-2.s3-ap-southeast-2.amazonaws.com"
:local_address => "10.10.22.45"
:local_port => 55796
:path => "/"
:port => 443
:reason_phrase => "Forbidden"
:remote_ip => "52.95.131.42"
:status => 403
:status_line => "HTTP/1.1 403 Forbidden\r\n"
acgray
September 13, 2017, 10:34am
2
This can happen as a result of a weird quirk with IAM users inside a VPC. In that case you need to add a policy explicitly allowing the user to access snowplow-hosted-assets
like you would your own (private) bucket.
3 Likes
That totally fixed it! Thank you very much!
I am still experiencing this issue in us-east-2. I tried giving my role permissions to read/list/get from the public hosted assets bucket
{
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::snowplow-hosted-assets-us-east-2"
],
"Effect": "Allow"
},
{
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::snowplow-hosted-assets-us-east-2/*"
],
"Effect": "Allow"
}
I should specify that this is the role for the ec2 instance that holds my emr runner jar. Do I need to add any permissions to the default EMR EC2 roles? (jobflow_role or service_role)
josh
February 6, 2019, 2:52pm
5
Hey @Steve_Conrad the role that needs explicit access to this bucket is generally the IAM Role attached to your Redshift Cluster - the roleArn
in your Redshift Target configuration file.
Would you mind also sharing your full error message here to help in debugging?
@josh before I got your response, I figured out that the redshift role is the role that was failing to read from the bucket, and I have not had a single job fail since I gave that role permissions to list/get from the hosted assets bucket. So if anyone has the same problem I (we) had, then try adding bucket read permissions to your redshift role.
The full error message I received was this
Listing s3://<<MY SHREDDED BUCKET>>/good/
Sleeping 16000 milliseconds
Listing s3://<<MY SHREDDED BUCKET>>/good/
Consistency check passed after 1 attempt. Following run ids found:
+ run=2019-02-04-18-00-22 with 16 atomic files (3 Mb) and with following shredded types:
* iglu:com.snowplowanalytics.snowplow/duplicate/jsonschema/1-*-* (s3://snowplow-hosted-assets-us-east-2/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/duplicate_1.json)
Loading s3://<<MY SHREDDED BUCKET>>/good/run=2019-02-04-18-00-22/
RDB Loader [2019-02-04T18:09:27.826Z]: COPY atomic.events
RDB Loader [2019-02-04T18:09:28.558Z]: COPY atomic.com_snowplowanalytics_snowplow_duplicate_1
Data loading error [Amazon](500310) Invalid operation: Problem reading manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 70D446EEC552B60A,ExtRid fYVhcZjoMVhzBsr7UKQSBGmHtwqxToGeaRbvjsORNqJsgSB1agjDmE8jobyJS0c1Ysm1+KoK05A=,CanRetry 1
Details:
-----------------------------------------------
error: Problem reading manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 70D446EEC552B60A,ExtRid fYVhcZjoMVhzBsr7UKQSBGmHtwqxToGeaRbvjsORNqJsgSB1agjDmE8jobyJS0c1Ysm1+KoK05A=,CanRetry 1
code: 8001
context: s3://snowplow-hosted-assets-us-east-2/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/duplicate_1.json
query: 564467
location: s3_utility.cpp:292
process: padbmaster [pid=12685]
-----------------------------------------------;
Dumping s3://<<MY ETL LOG BUCKET>>/logs/rdb-loader/2019-02-04-18-00-22/14fa48b8-9775-436a-8f4f-4b08f88d6549
INFO: Logs successfully dumped to S3 [s3://<<MY ETL LOG BUCKET>>/logs/rdb-loader/2019-02-04-18-00-22/14fa48b8-9775-436a-8f4f-4b08f88d6549]
The permissions I added to my redshift user are the following (ansible/yaml format):
# redshift needs to be able to read hosted assets
- Action: ['s3:ListBucket']
Effect: Allow
Resource:
- "arn:aws:s3:::snowplow-hosted-assets-us-east-2"
- Action: ['s3:GetObject', 's3:PutObject']
Effect: Allow
Resource:
- "arn:aws:s3:::snowplow-hosted-assets-us-east-2/*"
1 Like
josh
February 6, 2019, 3:05pm
7
Ahh excellent - glad you got it sorted!