Skip contexts from loading in Redshift

marien · December 11, 2020, 12:33pm

Hi All,

Currently I am looking in a way at preventing certain contexts to be loaded into Redshift. We will process the events in our realtime pipeline but it is not needed to be loaded into permanent storage.

Least wanted solution is to load it anyway and then delete after loading.

What I am thinking of now is to initially skip the RDB loader and archive steps in the EMR run. Then issuing an S3 command to remove the related folder/data from S3 and then to start a new EMR cluster and continue from the RDB load step.

Is there a better solution? Is it possible to add a step to the EMR flow? If yes: which steps/config do I need to alter.

Thanks in advance!

anton · December 11, 2020, 4:33pm

Hi @marien,

You can use Dataflow Runner to create more flexible workflows. Just copy steps from EmrEtlRunner into Dataflow Runner’s playbook and add your own with removing those contexts.

We do we have plans to implement custom blocklist for shredded types, but it’s quite far away I afraid.

Topic		Replies	Views
Disable shredding on EMR AWS batch pipeline (Legacy)	3	1358	September 5, 2018
Help with provisioning rdb loader AWS batch pipeline (Legacy)	8	1723	November 10, 2018
Is it possible to load data to Redshift after StreamEnricher? Storage targets	10	2823	September 12, 2018
EmrEtlRunner issues with --use-persistent-jobflow Troubleshooting	4	1293	October 17, 2019
Processing logs for a specific time period AWS batch pipeline (Legacy)	5	1518	November 14, 2016

Skip contexts from loading in Redshift

Related topics