Snowplow RDB Loader 2.0.0 released

We are very pleased to announce version 2.0.0 of the Snowplow RDB Loader.

This release adds SNS as a new shredding complete message destination to Shredder. Also, we overhauled the config structure of both Loader and Shredder, to make them less tightly coupled.

In combination, these two changes add a significant new capability to the RDB loader framework: the capability to run two loaders in parallel, each processing the same shredded data, and the same shredding complete messages.

Apart from that, this release doesn’t bring too many big changes even though it has a major version bump. We wanted to keep this release as minimal as possible due to our plans for a new kind of Loader framework. This release paves the way for this vision. Watch this space for more updates!

New shredding complete destination: SNS

Until this release, Shredder was able to send the shredding complete message to only SQS (Simple Queue Service). This release makes it possible to send it also to SNS (Simple Notiification Service) topic. With this new feature, it will be possible to fan-out shredding complete messages to multiple SQS queues. This makes it possible to have one shredder and multiple loaders.

SNS destination can be activated from the queue section of the Shredder config:

...
"queue": {
  "type": "sns",
  "topicArn": "arn:aws:sns:eu-central-1:123456789:test-sns-topic",
  "region": "eu-central-1"
}
...

Alternatively, you can still keep the legacy sqs behaviour by setting queue.type=sqs.

Splitting configs of Shredder and Loader

Until this release, Shredder and Loader were using the same config HOCON. After this release, they will use two different configs. This change decouples Shredder and Loader per our vision for a new kind of Loader framework.

Reference docs for new configs can be found on the following pages:

RDB Loader configuration

RDB Shredder configuration

Upgrading to 2.0.0

Since Loader and Shredder are using different configs now, you need to split your existing config into these two new configs.

Example configs can be found here.

Why have multiple loaders?

We are making a big thing here of the ability to run multiple loaders from the same shredded data. But why would you ever want to do this?

Admittedly it has limited benefit now, but it is on our roadmap to add new alternative destinations to the RDB loader framework. When this is done, you might want to run a seprate Redshift loader and, say, a Databricks loader from the same data.

2 Likes