DEPRECATION NOTICE: EmrEtlRunner

istreeter · October 5, 2021, 1:25pm

Following the announcement of our intentions back in January, Snowplow’s EmrEtlRunner is now finally and officially a deprecated application.

While we will continue to evaluate and address any reported security vulnerabilities for a further 6 months, we will no longer add new features or fix new bugs. If you encounter issues with EmrEtlRunner you should move to the new RDB Loader estate as detailed below.

What was EmrEtlRunner?

EmrEtlRunner has been around since the very early days of Snowplow. It was used in older versions of the pipeline to coordinate a AWS EMR batch job that copied events around in S3, enriched the events, shredded events, and loaded events into Redshift.

The enrichment functionality was deprecated long ago in favour of the streaming versions of Enrich. The shredder/loader functionality became redundant when we released RDB loader version R35.

How should I run the RDB shredder/loader?

The new RDB shredder runs in EMR using a very simple 2-stage EMR job, that copies data in S3 data and shreds it. We recommend using Dataflow Runner to coordinate the EMR job, and we have an example playbook on our docs site. The new RDB loader runs completely outside of EMR as a standalone application.

We now have complete confidence that the new architecture of shredder/loader is production-ready, and better than anything we had before. Shredding and loading now run in parallel, and shredding can continue even when the warehouse is unavailable. Furthermore, we added loads of helpful new features to the standalone loader, such as folder monitoring and runtime metrics

What does this mean if I still run EmrEtlRunner?

All previous versions of EmrEtlRunner will still be available on the Github releases page, so your pipeline will continue to work.

We recommend the upgrade guides on the Snowplow docs site to help you migrate to the newer architecture.

alex · October 8, 2021, 11:38am

Great news! Farewell EmrEtlRunner, thank you for your long and trusty service.

Topic		Replies	Views
Snowplow RDB Loader R35 relased New releases	0	1341	January 27, 2021
RDB Loader 1.1.0 docs refer to Shredding / EMR	2	722	September 5, 2022
Most up-to-date approach to running RDBLoader Storage targets	2	1206	June 12, 2018
Converting from emrEtlRunner to DataflowRunner example? AWS real-time pipeline	8	2388	October 22, 2018
Should I run rdb_load only? For engineers	7	1235	February 11, 2020

DEPRECATION NOTICE: EmrEtlRunner

What was EmrEtlRunner?

How should I run the RDB shredder/loader?

What does this mean if I still run EmrEtlRunner?

Related topics