Snowplow Web and Utils dbt packages released; passthrough fields, custom session identifiers, and much more is now possible!


We have just released production-ready versions 0.16.0 of our Web dbt package and version 0.15.0 of our Utils dbt package which contain plenty of new features focused around allowing more flexibility in the processing logic of our packages and making it even easier to add on top of our incremental sessionisation logic! You may have seen our previous post about this release when these were release candidates, but they’re now fully released and on dbt’s package hub! They contian two huge and highly requested new features in the web package; passthrough columns to derived tables, and custom session+user identifiers, which you can read more about below, both powered by a new base macro functionality made available in utils.

Finally, before we dig into the details, starting with this version both packages they are being released under the Snowplow Community License instead of the Apache 2. For what this means for you, check out our FAQ, and our announcement blog.

Passthrough fields

The web package now supports the ability to pass fields through from the base_events_this_run table to the downstream derived tables, including any custom entities or SDE field. For information about how to do this, see our Passthrough Fields page and the web configuration variables.

This feature has been highly requested and means you can now include custom dimension fields that you track in the derived tables without having to build a custom model; greatly reducing complexity and decreasing the time it takes to start getting insights from your data.

Custom sessionization and users

Arguably the best and most important reason people use our packages is the complicated incremental sessionization logic that we provide to ensure that all events within a session get processed in a single dbt run, meaning all aggregations are up to date and correct. Since the first release of the web package, this “session” has been based on the value of domain_sessionid and has not been changeable. With version 0.16.0 we are removing this restriction.

For users who wish, you will be able to use not just any field (base atomic table or entities) in your events table, but you can even provide custom SQL to generate your “session” identifier, be it the concatenation of 2 fields, or even the truncated timestamp of the event.

While this is quite an advanced feature and use case, we believe that for those that need this it will open a whole new way to work with our web package. This is all powered by our new base macros which you can read more about below!

For information about how to do this, see our Custom Sessionization page and the web configuration variables page.

Enhancements to user stitching

User Stitching in the web package now allows you, as well as using the custom user identifier above, to also select the field to use as your stitched user id in case you do not want to use the default user_id . For example you may wish to use a field from a custom entity. For information about how to do this, see our User Stitching page and the web configuration variables.

New Base Macro

Within the majority of our packages we use the same incremental sessionization logic, with perhaps slight changes in each. This means that, while rare, any changes to this logic are difficult to roll-out, we had a lot of duplication, and that the approach was quite strict which meant you weren’t able to alter things such as what the “session” identifier was or what timestamp the event table was filtered on based on the manifest table.

The culmination of months of work is a new set of macros that cleanly, yet flexibly, recreate the core session-based incremental logic of our packages. The macros enable a far greater choice when configuring the package, without having to make changes individually to each model.

While we are using these macros in the web package, which allows much of the new flexibility, they can also be used in your own projects to build a new “snowplow-type” package from scratch with the exact setup you need.

For information about how to do this, see our Advanced Usage of the Utils package page and the utils configuration variables.

We also plan for the full release to produce a repo of demo projects that you can use to understand how to use these macros and configure them to work as you need them to.

Custom Entity and SDE fields in Redshift/Postgres

Finally, thanks to the work done in those base macros, it is now possible (and easy!) to include custom entity or SDE fields in the snowplow_web_base_events_this_run table, which will make it MUCH easier to build custom models that use these fields, or to use them in the passthrough feature described above.

This has been another feature that we have had requested multiple times, and without this work would have been complex for both us and the user to implement. The new base macro allowed us to do this in a flexible, yet controlled way.

For information about how to do this, see our Building Custom Models page and the web configuration variables.

The rest

There’s a few others changes to checkout in the utils CHANGELOG and the one for web and make sure to checkout the migration guide for when you upgrade.