Feedback wanted! dbt packages snowplow-utils and snowplow-web 0.14.0 release candidate 1 released!

We’ve just released our first release candidates for the upcoming 0.14.0 versions of both our web and utils packages. These versions contain one major change, we are changing from our custom snowplow_incremental materialization to an overwritten version of the standard incremental version.

To make sure your models are optimized you need to add the following to your dbt_project.yml, this ensures that dbt uses our version of the merge sql rather than the default. Without this you will not see the optimized upsert that our materialization has previously provided.

# dbt_project.yml
...
dispatch:
  - macro_namespace: dbt
    search_order: ['snowplow_utils', 'dbt']

Outside of this, we do not expect users to notice performance changes to the running of our packages. If you do see a sizeable increase in run time, please revert to an earlier version then comment on this post or raise an issue in the github repo.

We’d love to get your feedback on these changes and highlight any issues you find. As this is a pre-release some features may change in upcoming versions. If you have custom models you need to migrate, please see here for how to do so.

Installing the pre-release

To install a pre-release package you need to specify the version number of the package. For example to test this version for web you should have the following in your package.yml file:

packages:
  - package: snowplow/snowplow_web
    version: 0.14.0-rc1

Reason for the change

Years ago, when our snowplow_incremental materialization was built, the standard dbt incremental method was not well optimised and did not support injecting custom code in any way. Our materialization ensured that the destination table scan was optimised to just the data that needed updating, saving costs to our users. However, maintaining a materialization for 4 warehouses was not easy, and it meant that we did not add in newer features such as the on_schema_change option due to the complexity of adding these in. We also know that adding a new warehouse to this (e.g. Azure) would be a large amount of work.

With the release of dbt-core 1.4 incremental_predicates were added to incremental materialization, this allows us to more easily inject the date range filters we need to optimize the upserts. Unfortunately the feature isn’t 100% perfect and due to a complicated story between compile and run time configs, the above dispatch code needs to be added. We believe this is a fair trade-off to gain access to the newer features and to simplify our packge.

Roadmap

We expect to have a second release candidate in mid/late-march, this will mostly be internal changes to our integration tests, as well as some changes to macros such as snowplow_is_incremental that are no longer required.

A full release is expected late march/early April assuming things go smoothly.

We expect all our packages will be migrated to the new materialization approach by the start of May, and the old items officially removed at a later date.

Snowplow utils 0.14.0-rc1

Summary

This is a pre-release version of the package, we believe it to be in working condition but you may encounter bugs and some features may change before the final release.

This version of the package begins the migration away from our snowplow_incremental materialization and instead provides an overwrite to the standard incremental materialization to provide the same performance improvements but in a simpler way. We expect users should see little to no performance change from the previous version, please let us know if you see performance degradation for large volumes of data.

Users will need to add the following to their dbt_project.yml to benefit from the enhancements:

# dbt_project.yml
...
dispatch:
  - macro_namespace: dbt
    search_order: ['snowplow_utils', 'dbt']

For custom models and more details, please see more details on our temporary docs page: Snowplow Materialization (Pre-Release) | Snowplow Documentation

Features

Deprecated old materialization
Add get_merge_sql for materialization
Fix a broken github action for our github pages

Installing

To install this version, use the following in your packages.yml file:

packages:
  - package: snowplow/snowplow_utils
    version: 0.14.0-rc1

Snowplow web 0.14.0-rc1

Summary

This is a pre-release version of the package, we believe it to be in working condition but you may encounter bugs and some features may change before the final release.

This version of the package begins the migration away from our snowplow_incremental materialization and uses an overwrite to the standard incremental materialization to provide the same performance improvements but in a simpler way. We expect users should see little to no performance change from the previous version, please let us know if you see performance degradation for large volumes of data.

Users will need to add the following to their dbt_project.yml to benefit from the enhancements:

# dbt_project.yml
...
dispatch:
  - macro_namespace: dbt
    search_order: ['snowplow_utils', 'dbt']

For custom models and more details, please see more details on our temporary docs page: Snowplow Materialization (Pre-Release) | Snowplow Documentation

Features

Use new materialization

Installing

To install this version, use the following in your packages.yml file:

packages:
  - package: snowplow/snowplow_web
    version: 0.14.0-rc1
1 Like