Summary
We have just released versions 0.16.0-rc1 of our Web dbt package, and version 0.15.0-rc1 of our Utils dbt package which contain plenty of new features focused around allowing more flexibility in the processing logic of our packages and making it even easier to add on top of our incremental sessionisation logic! There are two huge and highly requested new features in the web package; passthrough columns to derived tables, and custom session+user identifiers, which you can read more about below, both powered by a new base
macro functionality made available in utils.
As both of these are pre-releases, we would love for you to try these out and let us know of any bugs you find. You can provide feedback by raising a github issue or in this post on Discourse!
Finally, before we dig into the details, starting with this version both packages they are being released under the Snowplow Community License instead of the Apache 2. For what this means for you, check out our FAQ, and our announcement blog.
All docs for this RC are available at a preview site which can be found here, and will continue to be improved on over the course of the RC. Once we make a full release these will be available on our usual docs site.
Passthrough fields
The web package now supports the ability to pass fields through from the base_events_this_run
table to the downstream derived tables, including any custom entities or SDE field. For information about how to do this, see our Passthrough Fields page and the web configuration variables.
This feature has been highly requested and means you can now include custom dimension fields that you track in the derived tables without having to build a custom model; greatly reducing complexity and decreasing the time it takes to start getting insight from your data.
Custom sessionization and users
Arguably the best and most important reason people use our packages is the complicated incremental sessionization logic that we provide to ensure that all events within a session get processed in a single dbt run, meaning all aggregations are up to date and correct. Since the first release of the web package, this “session” has been based on the value of domain_sessionid
and has not been changeable. With version 0.16.0 we are removing this restriction.
For users who wish, you will be able to use not just any field (base atomic table or entities) in your events table, but you can even provide custom SQL to generate your “session” id, be it the concatenation of 2 fields, or even the truncated timestamp of the event.
While this is quite an advanced feature and use case, we believe that for those that need this it will open a whole new way to work with our web package. This is all powered by our new base macros which you can read more about below!
For information about how to do this, see our Custom Sessionization page and the web configuration variables.
Enhancements to user stitching
User Stitching in the web package now, as well as using the custom user identifier above, also allows you to select the field to use as your stitched user id in case you do not want to use the default user_id
. For example you may wish to use a field from a custom entity. For information about how to do this, see our User Stitching page and the web configuration variables.
New Base Macro
Within the majority of our packages we use the same incremental sessionization logic, with perhaps slight changes in each. This means that, while rare, any changes to this logic are difficult to roll-out, we had a lot of duplication, and that the approach was quite strict which meant you weren’t able to alter things such as what the “session” identifier was or what timestamp the event table was filtered on based on the manifest table.
The culmination of months of work is a new set of macros that cleanly, yet flexibly, recreate the core session-based incremental logic of our packages. The macros enable a far greater choice when configuring the package, without having to make changes individually to each model.
While we are using these macros in the web package, which allows much of the new flexibility, they can also be used in your own projects to build a new “snowplow-type” package from scratch with the exact setup you need.
For information about how to do this, see our Advanced Usage of the Utils package page and the utils configuration variables.
We also plan for the full release to produce a repo of demo projects that you can use to understand how to use these macros and configure them to work as you need them to.
Custom Entity and SDE fields in Redshift/Postgres
Finally, thanks to the work done in those base macros, it is now possible (and easy!) to include custom entity or SDE fields in the snowplow_web_base_events_this_run
table, which will make it MUCH easier to build custom models that use these fields, or to use them in the passthrough feature described above.
This has been another feature that we have had requested multiple times, and without this work would have been complex for both us and the user to implement. The new base macro allowed us to do this in a flexible, yet controlled way.
For information about how to do this, see our Building Custom Models page and the web configuration variables.
The rest
There’s a few others things to checkout in the utils CHANGELOG and the one for web and make sure to checkout the migration guide for when you upgrade.
Release Timeline
With the release of RC1 today, unless we receive a lot of bugs or feedback we cannot action in time, we expect to be releasing a full version of these packages before the end of September. Please keep an eye out for that release post where we will also talk about what is next for our packages!
Thanks
This was truly a huge team effort, with all our Analytics Engineers being involved from initial design discussions, to development, testing, reviewing, and releasing. Thanks!