403 forbidden on s3://snowplow-hosted-assets

Didn’t want to hijack Phil’s thread, but I believe having the same issue from ap-southeast-2. Would anyone be able to confirm?

Unexpected error: Expected(200) <=> Actual(403 Forbidden)
excon.error.response
  :body          => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>78D05CD27585AF4B</RequestId><HostId>xASPF1IGxUzvOTmNF/YNJlztuI57my2/WHnNuZM+jpWBQKrUsbmDeRl19ssp8rDL79nosEfyKrc=</HostId></Error>"
  :cookies       => [
  ]
  :headers       => {
    "Content-Type"        => "application/xml"
    "Date"                => "Wed, 13 Sep 2017 03:31:55 GMT"
    "Server"              => "AmazonS3"
    "x-amz-bucket-region" => "ap-southeast-2"
    "x-amz-id-2"          => "xASPF1IGxUzvOTmNF/YNJlztuI57my2/WHnNuZM+jpWBQKrUsbmDeRl19ssp8rDL79nosEfyKrc="
    "x-amz-request-id"    => "78D05CD27585AF4B"
  }
  :host          => "snowplow-hosted-assets-ap-southeast-2.s3-ap-southeast-2.amazonaws.com"
  :local_address => "10.10.22.45"
  :local_port    => 55796
  :path          => "/"
  :port          => 443
  :reason_phrase => "Forbidden"
  :remote_ip     => "52.95.131.42"
  :status        => 403
  :status_line   => "HTTP/1.1 403 Forbidden\r\n"

This can happen as a result of a weird quirk with IAM users inside a VPC. In that case you need to add a policy explicitly allowing the user to access snowplow-hosted-assets like you would your own (private) bucket.

3 Likes

That totally fixed it! Thank you very much!

I am still experiencing this issue in us-east-2. I tried giving my role permissions to read/list/get from the public hosted assets bucket

{
        "Action": [
            "s3:ListBucket"
        ],
        "Resource": [
            "arn:aws:s3:::snowplow-hosted-assets-us-east-2"
        ],
        "Effect": "Allow"
    },
    {
        "Action": [
            "s3:GetObject",
            "s3:PutObject"
        ],
        "Resource": [
            "arn:aws:s3:::snowplow-hosted-assets-us-east-2/*"
        ],
        "Effect": "Allow"
    }

I should specify that this is the role for the ec2 instance that holds my emr runner jar. Do I need to add any permissions to the default EMR EC2 roles? (jobflow_role or service_role)

Hey @Steve_Conrad the role that needs explicit access to this bucket is generally the IAM Role attached to your Redshift Cluster - the roleArn in your Redshift Target configuration file.

Would you mind also sharing your full error message here to help in debugging?

@josh before I got your response, I figured out that the redshift role is the role that was failing to read from the bucket, and I have not had a single job fail since I gave that role permissions to list/get from the hosted assets bucket. So if anyone has the same problem I (we) had, then try adding bucket read permissions to your redshift role.

The full error message I received was this

Listing s3://<<MY SHREDDED BUCKET>>/good/
Sleeping 16000 milliseconds
Listing s3://<<MY SHREDDED BUCKET>>/good/
Consistency check passed after 1 attempt. Following run ids found:
+ run=2019-02-04-18-00-22 with 16 atomic files (3 Mb) and with following shredded types:
  * iglu:com.snowplowanalytics.snowplow/duplicate/jsonschema/1-*-* (s3://snowplow-hosted-assets-us-east-2/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/duplicate_1.json)
Loading s3://<<MY SHREDDED BUCKET>>/good/run=2019-02-04-18-00-22/
RDB Loader [2019-02-04T18:09:27.826Z]: COPY atomic.events
RDB Loader [2019-02-04T18:09:28.558Z]: COPY atomic.com_snowplowanalytics_snowplow_duplicate_1
Data loading error [Amazon](500310) Invalid operation: Problem reading manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 70D446EEC552B60A,ExtRid fYVhcZjoMVhzBsr7UKQSBGmHtwqxToGeaRbvjsORNqJsgSB1agjDmE8jobyJS0c1Ysm1+KoK05A=,CanRetry 1
Details: 
 -----------------------------------------------
  error:  Problem reading manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 70D446EEC552B60A,ExtRid fYVhcZjoMVhzBsr7UKQSBGmHtwqxToGeaRbvjsORNqJsgSB1agjDmE8jobyJS0c1Ysm1+KoK05A=,CanRetry 1
  code:      8001
  context:   s3://snowplow-hosted-assets-us-east-2/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/duplicate_1.json
  query:     564467
  location:  s3_utility.cpp:292
  process:   padbmaster [pid=12685]
  -----------------------------------------------;
Dumping s3://<<MY ETL LOG BUCKET>>/logs/rdb-loader/2019-02-04-18-00-22/14fa48b8-9775-436a-8f4f-4b08f88d6549
INFO: Logs successfully dumped to S3 [s3://<<MY ETL LOG BUCKET>>/logs/rdb-loader/2019-02-04-18-00-22/14fa48b8-9775-436a-8f4f-4b08f88d6549]

The permissions I added to my redshift user are the following (ansible/yaml format):

# redshift needs to be able to read hosted assets
- Action: ['s3:ListBucket']
    Effect: Allow
    Resource: 
    - "arn:aws:s3:::snowplow-hosted-assets-us-east-2"
- Action: ['s3:GetObject', 's3:PutObject']
    Effect: Allow
    Resource: 
    - "arn:aws:s3:::snowplow-hosted-assets-us-east-2/*"
1 Like

Ahh excellent - glad you got it sorted!