Snowplow-normalize 0.2.0 dbt package released

We are happy to announce the release of Snowplow Normalize v0.2.0!

This release drops support for dbt versions <1.3 to use the latest dbt-utils package, adds functionality for custom user_ids and multiple events per table, as well as ensuring all appropriate versions of BigQuery contexts are used. Due to these changes this version contains a number of breaking changes, so please read the Upgrade section carefully. We expect to have less breaking changes in the future now the majority of functionality is available, which should lead to easier upgrades in the future.

:rotating_light: Breaking Changes :rotating_light:

  • Config file structure has changed to enable multiple event types per table
  • Macro inputs have changed to enable multiple event types per table and custom user_id field
  • The filtered events table has a new column to enable multiple event types per table
  • Support for versions of dbt < 1.3 has been dropped

Features

  • Allow for multiple event types (including self-describing) per normalized table
  • Allow for custom user_id field within users table
  • BigQuery optimized to use all same major version number sdes and contexts (in line with other Snowplow packages)
  • Enhanced testing and warnings under the hood
  • Drop support for dbt versions below 1.3 (Close #17)

Upgrading

To upgrade the package, bump the version number in the packages.yml file in your project. You will need dbt v1.3 at least to be able to use this version of the package.

Upgrading your config file

To upgrade your config file:

  • Change the event_name field to event_names and make the value a list
  • Change the self_describing_event_schema field to self_describing_event_schemas and make the value a list
  • If you wish to make use of the new features, see the example config or the docs for more information

Upgrading your models

Once you have upgraded your config file, the easiest way to ensure your models match the new macros is to re-run the Python script. If you would prefer not to do this, you can:

  • For each normalized model:
    • Convert the event_name and sde_cols fields to lists, and pluralize the names in both the set and the macro call
    • Add a new field, sde_aliases which is an empty list, add this between sde_types and context_cols in the macro call
  • For your filtered events table:
    • Change the unique_key in the config section to unique_id
    • Add a line between the event_table_name and from lines for each select statement; , event_id||'-'||'<that_event_table_name>' as unique_id, with the event table name for that select block.
  • For your users table:
    • Add 3 new values to the start of the macro call, 'user_id','','', before the user_cols argument.

Upgrade your filtered events table

If you use the master filtered events table, you will need to add a new column for the latest version to work. If you have not processed much data yet it may be easier to simply re-run the package from scratch using dbt run --full-refresh --vars 'snowplow__allow_refresh: true', alternatively run the following in your warehouse, replacing the schema/dataset/warehouse and table name for your table:

ALTER TABLE {schema}.{table} ADD COLUMN unique_id STRING; 
UPDATE {schema}.{table} SET unique_id = event_id||'-'||event_table_name WHERE 1 = 1;
2 Likes