Currently we are running our Snowplow ETL runner at the the version 106.
At the first EMR step it is running the s3DistCp to copy the source files in S3 to the etl-processing S3 folder at different accounts, the command is like this:
/usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar --src [source_bucket] --dest [dest_bucket] --s3Endpoint s3-eu-west-1.amazonaws.com --srcPattern .*localhost\_access\_log.*\.txt.* --deleteOnSuccess --groupBy .*/_*(.+)
The command has the parameter --deleteOnSuccess
but when I am copying the files from the S3 source bucket, that is in a different amazon account, the files are copied but not deleted.
Testing the same process using a S3 source bucket in the same account than the EMR job it works fine, deleting the files after copy.
The EC2 role has permissions to read and delete files in the source bucket, and also the bucket has permissions to read and delete files for the EC2 role.
I reviewed the bucket permissions and I can manually delete the files using the AWS cli.
the permission at the bucket are:
{
“Sid”: “”,
“Effect”: “Allow”,
“Principal”: {
“AWS”: [
[AWS EC2 role]
]
},
“Action”: [
“s3:ListBucketVersions”,
“s3:ListBucket”,
“s3:GetObjectVersion”,
“s3:GetObject”,
“s3:DeleteObject”
],
“Resource”: [
[AWS Bucket]
[AWS Bucket folders]
]
},
the role permissions are:
{
“Sid”: “”,
“Effect”: “Allow”,
“Action”: [
“s3:GetObject”,
“s3:ListBucket”,
“s3:DeleteObject”
],
“Resource”: [
[AWS bucket],
[AWS bucket folder]
]
}
Does anyone know why it is happening?
Do I need to add different permissions or the S3DistCp can’t delete files stored in a different AWS account?
Thank you in advance
Rafael Bottega