Enrich 4.0.0 released

We are excited to release version 4.0.0 of Snowplow’s Enrich.

Atomic fields lengths configurable

Several atomic fields, such as mkt_clickid have length limits defined (in this case, 128 characters). Recent versions of Enrich enforce these limits, so that oversized data does not break loading into the warehouse columns. However, over time we’ve observed that valid data does not always fit these limits. For example, TikTok click ids can be up to 500 (or 1000, according to some sources) characters long.

In this release, we are adding a way to configure the limits, and we are increasing the default limits for several fields:

  • mkt_clickid limit increased from 128 to 1000
  • page_url limit increased from 4096 to 10000
  • page_referrer limit increased from 4096 to 10000

Depending on your configuration, this might be a breaking change:

  • If you have featureFlags.acceptInvalid set to true in Enrich, then you probably don’t need to worry, because you had no validation in the first place (although we do recommend to enable it).
  • If you have featureFlags.acceptInvalid set to false (default), then previously invalid events might become valid (which is a good thing), and you need to prepare your warehouse for this eventuality:
    • For Redshift, you should resize the respective columns, e.g. to VARCHAR(1000) for mkt_clickid. If you don’t, Redshift will truncate the values.
    • For Snowflake and Databricks, we recommend removing the VARCHAR limit altogether. Otherwise, loading might break with longer values. Alternatively, you can alter the Enrich configuration to revert the changes in the defaults.
    • For BigQuery, no steps are necessary.

Below is an example of how to configure these limits:

{
  ...
  # Optional. Configuration section for various validation-oriented settings.
  "validation": {

    # Optional. Configuration for custom maximum atomic fields (strings) length.
    # Map-like structure with keys being field names and values being their max allowed length
    "atomicFieldsLimits": {
        "app_id": 5
        "mkt_clickid": 100000
        # ...and any other 'atomic' field with custom limit
    }
  }
}

Azure Blob Storage support

enrich-kafka can now download enrichments’ assets (e.g. MaxMind database) from Azure Blob Storage.
See the configuration reference for the setup.

New license

Following our recent licensing announcement, Enrich is now released under the Snowplow Limited Use License Agreement.

stream-enrich assets and enrich-rabbitmq deprecated

As announced a while ago, stream-enrich assets and enrich-rabbitmq are now deprecated.
Only one asset now exists for each type of message queue.
Setup guide for each can be found on this page.

Upgrading to 4.0.0

Migration guide can be found on this page.

3 Likes