Snowplow RDB Loader R35 relased

We’re extremely happy to announce Snowplow RDB Loader R35, marking deprecation of EmrEtlRunner and bringing major simplification to the pipeline architecture.

Changes

  • EmrEtlRunner is no longer required for Redshift loading (and for any other version of the pipeline, marking its deprecation)
  • We significantly reduced EMR steps, making data processing much cheaper and reliable: just one S3DistCp step remains, RDB Loader is not an EMR step anymore
  • config.yml and redshift.json replalced with single HOCON file
  • S3 discovery replalced with communication via SQS
  • Unification and simplification of shredded data partitioning

For a full changelong, please see our release notes. Upgrade guide is available on our documentation website:

Roadmap

  • R35 is considered a public beta - while it’s been carefully tested in sandbox environments, it hasn’t been used in production yet. In the next (1.0.0) version we’re planning to introduce more breaking changes
  • Streaming Shredder - following success of Enrich FS2 and BigQuery Stream Loader, we’re planning to add new streaming shredder, deprecating EMR altogether
  • Unification with Snowflake Loader, making our loading software portable and easy to maintain
5 Likes