We’re extremely happy to announce Snowplow RDB Loader R35, marking deprecation of EmrEtlRunner and bringing major simplification to the pipeline architecture.
Changes
- EmrEtlRunner is no longer required for Redshift loading (and for any other version of the pipeline, marking its deprecation)
- We significantly reduced EMR steps, making data processing much cheaper and reliable: just one S3DistCp step remains, RDB Loader is not an EMR step anymore
-
config.yml
andredshift.json
replalced with single HOCON file - S3 discovery replalced with communication via SQS
- Unification and simplification of shredded data partitioning
For a full changelong, please see our release notes. Upgrade guide is available on our documentation website:
Roadmap
- R35 is considered a public beta - while it’s been carefully tested in sandbox environments, it hasn’t been used in production yet. In the next (1.0.0) version we’re planning to introduce more breaking changes
- Streaming Shredder - following success of Enrich FS2 and BigQuery Stream Loader, we’re planning to add new streaming shredder, deprecating EMR altogether
- Unification with Snowflake Loader, making our loading software portable and easy to maintain