Skip contexts from loading in Redshift

Hi All,

Currently I am looking in a way at preventing certain contexts to be loaded into Redshift. We will process the events in our realtime pipeline but it is not needed to be loaded into permanent storage.

Least wanted solution is to load it anyway and then delete after loading.

What I am thinking of now is to initially skip the RDB loader and archive steps in the EMR run. Then issuing an S3 command to remove the related folder/data from S3 and then to start a new EMR cluster and continue from the RDB load step.

Is there a better solution? Is it possible to add a step to the EMR flow? If yes: which steps/config do I need to alter.

Thanks in advance!

Hi @marien,

You can use Dataflow Runner to create more flexible workflows. Just copy steps from EmrEtlRunner into Dataflow Runner’s playbook and add your own with removing those contexts.

We do we have plans to implement custom blocklist for shredded types, but it’s quite far away I afraid.

1 Like