Most up-to-date approach to running RDBLoader

caleb_bertsch · June 11, 2018, 5:28pm

I’m currently using the snowplow docker containers for the collector, enrich, and storage (to s3) steps. Every piece of documentation I can find on RDB Loader indicates I need to run it with EmrEtlRunner. Since I used the docker containers for the others steps I never had to set that up, and don’t quite know how that fits in. I’ve also seen talk that EmrEtlRunner is being replaced with the DataFlow Runner… so I’m a little confused at the current best approach for this.

anton · June 12, 2018, 8:28am

Hi @caleb_bertsch,

EmrEtlRunner is indeed still the only one recommended way to run RDB Loader today. We still have plans to deprecate it in favor of dataflow runner, but these plans unfortunately are without even approximate ETA.

It is possible to run it on non-EMR environment, but in the end you still will need to configure EmrEtlRunner to run RDB Shredder, which is required step for using RDB Loader.

So, unless you want to dive very deep into custom solutions - I’d recommend you to stick with EmrEtlRunner, especially that after R102 (I recommend to use R104) it goes with a lot Stream Enrich related goodness.

caleb_bertsch · June 12, 2018, 12:13pm

Thank you! This is exactly what I was looking for and couldn’t for the life of me find.

Topic		Replies	Views
RDB Loader, Storage Loader, EmrEtlRunner Storage targets	14	2312	October 22, 2019
DEPRECATION NOTICE: EmrEtlRunner Announcements	2	951	October 8, 2021
Should I run rdb_load only? For engineers	7	1235	February 11, 2020
RDB Loader 1.1.0 docs refer to Shredding / EMR	2	722	September 5, 2022
Does Dataflow Runner replace EmrEtlRunner For engineers	6	2568	August 16, 2017

Most up-to-date approach to running RDBLoader

Related topics