We are happy to announce the release of Snowplow Normalize v0.2.0!
This release drops support for dbt versions <1.3 to use the latest dbt-utils package, adds functionality for custom user_ids and multiple events per table, as well as ensuring all appropriate versions of BigQuery contexts are used. Due to these changes this version contains a number of breaking changes, so please read the Upgrade section carefully. We expect to have less breaking changes in the future now the majority of functionality is available, which should lead to easier upgrades in the future.
Breaking Changes
- Config file structure has changed to enable multiple event types per table
- Macro inputs have changed to enable multiple event types per table and custom user_id field
- The filtered events table has a new column to enable multiple event types per table
- Support for versions of dbt < 1.3 has been dropped
Features
- Allow for multiple event types (including self-describing) per normalized table
- Allow for custom user_id field within users table
- BigQuery optimized to use all same major version number sdes and contexts (in line with other Snowplow packages)
- Enhanced testing and warnings under the hood
- Drop support for dbt versions below 1.3 (Close #17)
Upgrading
To upgrade the package, bump the version number in the packages.yml file in your project. You will need dbt v1.3 at least to be able to use this version of the package.
Upgrading your config file
To upgrade your config file:
- Change the
event_name
field toevent_names
and make the value a list - Change the
self_describing_event_schema
field toself_describing_event_schemas
and make the value a list - If you wish to make use of the new features, see the example config or the docs for more information
Upgrading your models
Once you have upgraded your config file, the easiest way to ensure your models match the new macros is to re-run the Python script. If you would prefer not to do this, you can:
- For each normalized model:
- Convert the
event_name
andsde_cols
fields to lists, and pluralize the names in both the set and the macro call - Add a new field,
sde_aliases
which is an empty list, add this betweensde_types
andcontext_cols
in the macro call
- Convert the
- For your filtered events table:
- Change the
unique_key
in the config section tounique_id
- Add a line between the
event_table_name
andfrom
lines for each select statement;, event_id||'-'||'<that_event_table_name>' as unique_id
, with the event table name for that select block.
- Change the
- For your users table:
- Add 3 new values to the start of the macro call,
'user_id','',''
, before theuser_cols
argument.
- Add 3 new values to the start of the macro call,
Upgrade your filtered events table
If you use the master filtered events table, you will need to add a new column for the latest version to work. If you have not processed much data yet it may be easier to simply re-run the package from scratch using dbt run --full-refresh --vars 'snowplow__allow_refresh: true'
, alternatively run the following in your warehouse, replacing the schema/dataset/warehouse and table name for your table:
ALTER TABLE {schema}.{table} ADD COLUMN unique_id STRING;
UPDATE {schema}.{table} SET unique_id = event_id||'-'||event_table_name WHERE 1 = 1;