RDB Loader 5.1.0 released

We’re pleased to announce we’ve released RDB Loader version 5.1.0

This release brings SSH tunnel connection recovery to Redshift Loader. Also, it makes disabling in-batch natural deduplication in Batch Transformer possible.

Option to disable in-batch natural deduplication in Batch Transformer

RDB Loader tries to automatically deal with duplicate events. One of the related functionalities is the “in-batch natural deduplication” — deduplication of identical events (both id and payload) found in the same batch.

Previously, it wasn’t possible to disable in-batch natural deduplication in Batch Transformer. We have found that in-batch natural deduplication affects performance therefore we have made disabling it possible. If duplicate events aren’t a problem for you, we suggest disabling deduplication.

It can be disabled by adding following section to the config:

  "deduplication": {
    # When natural deduplication is disabled, 'synthetic' deduplication needs to be disabled too. 
    "synthetic": {
      "type": "NONE"
    }
    "natural": false
  }

SSH tunnel connection recovery in Redshift Loader

Redshift loader can connect to a private Redshift cluster through an SSH tunnel. Previously, if SSH tunnel session was disconnected, the loader didn’t have a way to discover it. We added retry around SSH tunnel connection to make it possible to recover from this problem and to make it more robust.

Upgrading to 5.1.0

If you are already using a recent version of RDB Loader (3.0.0 or higher) then upgrading to 5.1.0 is as simple as pulling the newest docker images. There are no changes needed to your configuration files.

docker pull snowplow/transformer-kinesis:5.1.0
docker pull snowplow/rdb-loader-redshift:5.1.0
docker pull snowplow/rdb-loader-snowflake:5.1.0
docker pull snowplow/rdb-loader-databricks:5.1.0

The Snowplow docs site has a full guide to running the RDB Loader.

3 Likes