In my workflow I don’t use neither Postgres nor Redshift as storage target. I just download the enriched events from s3 and use the python sdk kit to work with them.
So it’s perfectly ok if the emr-etl-runner ended right after enriching and skipped the shredding + everything after altogether. Is this possible? Running it with
--skip shred option results in error
No run folders in [s3n://splw-company-out/shredded/good/] found
Interesting… I’ve never tried skipping shredding (after I started using it).
At which point are you receiving the error?
Perhaps the shred folders are just required as placeholders. Have you got placeholders set?
@pocin, Skipping just
shred is not sufficient. You rather need to skip
@robkingston the error was even before the emr cluster started in aws, so I guess something like a pre-flight check.
@ihor Aha that makes sense!
For my future reference this schema would help the past me https://github.com/snowplow/snowplow/wiki/Batch-pipeline-steps
Thanks a lot , there is so much to wrap my head around