RDB Loader 6.0.0 released

Version 6.0.0 of Snowplow RDB Loader is released!

The highlight of this release is the introduction of new schema evolution mechanism for Redshift, affecting all transformers & RDB Loader for Redshift only.

[Redshift-only] New migration mechanism & recovery tables

Previously, Redshift loaders would migrate the shredded table to the latest available schema version. This could lead to a race condition between transformer & loader, see this issue for more details.

As of 6.0.0, loader will migrate the shredded table to the latest schema version discovered in the shredding_complete payload (rather than the latest existing version). Also, thanks to the new file hierarchy described below, the loader is able to issue one COPY statement per schema version. This enables the loader to decide on the exact set of columns.

Also, we are introducing a new mechanism to prevent the loader from failing when the schema is not evolved correct. You can find more information about it in here. Also, you can check the upgrade guide to understand how this feature might impact your existing schemas and what to do about it.

To ensure that shredded tables are migrated consistently, we released Iglu Server 0.11.0 which disallows non-sequential version updates. As an example, if Iglu Server already holds versions 1-0-0 and 1-1-0 of a schema, then it will reject an attempt to publish schema version 1-0-1.

[Redshift-only] Monitoring recovery tables

Previous versions have been printing the table name to stdout. As of 6.0.0, in case an event is loaded to a recovery table, the name of that recovery table will be printed instead.

In case webhook is configured, previous recent versions would use load_succeeded/3-0-0 to report information about the successful load.

As of 6.0.0, loader will use load_succeeded/3-0-1 schema which comes with $.recoveryTableNames key to report the list of names of recovery tables loaded in the batch (breaking schema keys from shredding_complete payload).

[Redshift-only] $.featureFlags.disableRecovery configuration

RDB Loader 6.0.0 introduces a new configuration, $.featureFlags.disableRecovery, a list of schema criterion to disable migration for.

For the provided schema criterions only, RDB Loader will neither migrate the corresponding shredded table nor create recovery tables for breaking schema versions. Loader will attempt to load to the corresponding shredded table without migrating.

This is useful if you have older schemas with breaking changes and don’t want the loader to apply the new logic to them.

New file hierarchy for shredded events

Both batch & stream transformers would write shredded events based on the following scheme so far


vendor/name/model

As of 6.0.0, all transformers will use the following scheme


vendor/name/model/revision/addition

which increases granularity of the output, enabling higher precision in downstream usage.

Removal of padding \N char

Transformers write events to S3 to be loaded by Redshift. For the loading command to work, all events at a given path (e.g. com.acme/button_click/1) must follow the same format. A batch, however, may contain events with different versions of a given schema. In particular, events with a newer schema might have new fields not present in the events with an older one.

Previously, transformers solved this problem by formatting all events according to the latest version of the schema and using the \N character in case of missing fields.

As of 6.0.0, there is no need to do that, because β€” as explained above β€” events using different versions of a schema are written to different paths.

New license

Following our recent licensing announcement, RDB Loader is now released under the Snowplow Limited Use License Agreement.

Upgrading to 6.0.0

Upgrade guide can be found on this page.

1 Like