Snowflake Transformer Not a file error

Hi all,
We have been implementing the following snowplow pipeline to load some data into snowflake.
Collector → Enricher → S3 Loader → (EMR FROM HERE) s3DistCP → Snowflake Transformer → Snowflake Loader → s3DistCP for archive

Up until the first s3DistCP, everything works fine, but when running the jobs on EMR, the transformer outputs the following error:

Caused by: java.io.IOException: Not a file: s3a://snowplow-events/enriched/archive/run=2021-05-26-13-21-12/2021/05

Im guessing that error appears because that is in fact not a file, its a folder. After s3distcp, the folder structure is as follows:
snowplow-events/enriched/archive/run=2021-05-26-13-21-12/YEAR/MONTH/DAY/HOUR

Is there some configuration i need to change to make it run correctly? This is the configuration for the s3distcp step:

“arguments”: [
“s3-dist-cp”,
“–src”,
“s3://snowplow-events/enriched/good/”,
“–dest”,
“s3://snowplow-events/enriched/archive/run={{nowWithFormat “2006-01-02-15-04-05”}}/”,
“–srcPattern”,
“.*\.gz”,
“–s3Endpoint”,
s3.eu-west-1.amazonaws.com”,
“–s3ServerSideEncryption”
]

Thank you very much for all the support! Let me know if i need to provide more information.

Best regards,
Martin

Hi @Martin_Cristobal ,

Indeed the reason is that Snowflake loader expects to find only files in the run=.../ folder, whereas your data is partitioned by date.

To solve this you need to comment this line in S3 loader’s config.