RDB Loader 5.5.0 released

We’re pleased to announce that we’ve released RDB Loader version 5.5.0

This release improves the robustness of RDB Loader and adds metrics to the batch transformer.

Config parsing improvements

Before version 5.5.0, the only way of passing configuration to application was providing BASE64 encoded HOCON (for application config) and JSON (for Iglu resolver config) as a command line options.

Starting from version 5.5.0, it’s possible to provide a full path to the configuration files. Here is an example, which mounts a config directory into the docker container at run time:

docker run \
  -v /path/to/config:/myconfig \
  snowplow/rdb-loader-redshift:5.5.0 \
  --config /myconfig/loader.hocon \
  --iglu-config /myconfig/resolver.json

It’s no longer necessary to use BASE64 encoded strings on the command line, but to preserve compatibility the old way of configuring is still supported.

What is more, it’s now possible to provide HOCON file for Iglu resolver configuration, so just like in the case of application configuration. This is important, as it allows you to utilize all great features of HOCON format for Iglu as well, like environment variable resolution. Plain JSON file is still supported.

These changes apply for all the loaders (Redshift, Snowflake, Databricks) and transformer (batch, streaming) applications.

Improved robustness of the loader

We’ve made quite a few small under-the-hood improvements, which we hope will make the loader more resilient against transient failures. We identified some of the most common edge-case error scenarios, where previous versions of the loader might hit an error, e.g. due to a stale connection or a network issue. The small changes include better handling of old connections, and retrying on transient failures.

Batch Transformer: transform_duration metric

Batch transformer can now send a new metric to Cloudwatch, if configured: transform_duration, which contains the duration needed to transform an input folder.

Upgrading

If you are already using a recent version of RDB Loader (3.0.0 or higher) then upgrading to 5.5.0 is as simple as pulling the newest docker images.
There are no changes needed to your configuration files.

docker pull snowplow/rdb-loader-redshift:5.5.0
docker pull snowplow/rdb-loader-snowflake:5.5.0
docker pull snowplow/rdb-loader-databricks:5.5.0
docker pull snowplow/transformer-pubsub:5.5.0
docker pull snowplow/transformer-kinesis:5.5.0

Starting from this version, batch transformer requires to use Java 11 om EMR (default is Java 8), for instance by running this script as a bootstrap action (needs to be stored on s3):

#!/bin/bash

set -e

sudo update-alternatives --set java /usr/lib/jvm/java-11-amazon-corretto.x86_64/bin/java

exit 0

Snowplow docs website has a full guide for running the RDB Loader and the transformer.

1 Like