RDB transformer

Hey!
I have a problem with rdb transformer.
I got WARN org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteSupport - AccessDenied: error for path
Is it ok that transformer tries to delete smth?
My settings are
“formats”: {
“transformationType”: “widerow”,
“fileFormat”: “parquet”
}

Permissions are
statement {
effect = “Allow”
actions = [
“s3:PutObject”,
“s3:ListBucketMultipartUploads”,
“s3:ListBucket”,
“s3:GetObject”,
“s3:GetBucketLocation”,
“s3:AbortMultipartUpload”,
“s3:GetObjectVersion”
]
resources = [
“arn:aws:s3:::transformed”,
“arn:aws:s3:::transformed/snowplow/widerow_events/*”
]

Before I used shredding and all worked without log of delete attempt.

Are you using batch or stream version of the transformer?

I’m using stream transformer snowplow/transformer-kinesis:5.2.0

We noticed this behaviour too. Transformer would only ever Put objects, it does not do any deletions.

Current suspect is hadoop-aws used by the the underlaying library (partquet4s). Here is a blog with permissions experiments, it talks about spark but under the hood it is the same library.

So solution is just to let it do its’ thing and loosen the permissions.

Thanks for your response!