We have just released versions 0.16.0-rc1 of our Web dbt package, and version 0.15.0-rc1 of our Utils dbt package which contain plenty of new features focused around allowing more flexibility in the processing logic of our packages and making it even easier to add on top of our incremental sessionisation logic! There are two huge and highly requested new features in the web package; passthrough columns to derived tables, and custom session+user identifiers, which you can read more about below, both powered by a new
base macro functionality made available in utils.
As both of these are pre-releases, we would love for you to try these out and let us know of any bugs you find. You can provide feedback by raising a github issue or in this post on Discourse!
Finally, before we dig into the details, starting with this version both packages they are being released under the Snowplow Community License instead of the Apache 2. For what this means for you, check out our FAQ, and our announcement blog.
All docs for this RC are available at a preview site which can be found here, and will continue to be improved on over the course of the RC. Once we make a full release these will be available on our usual docs site.
The web package now supports the ability to pass fields through from the
base_events_this_run table to the downstream derived tables, including any custom entities or SDE field. For information about how to do this, see our Passthrough Fields page and the web configuration variables.
This feature has been highly requested and means you can now include custom dimension fields that you track in the derived tables without having to build a custom model; greatly reducing complexity and decreasing the time it takes to start getting insight from your data.
Arguably the best and most important reason people use our packages is the complicated incremental sessionization logic that we provide to ensure that all events within a session get processed in a single dbt run, meaning all aggregations are up to date and correct. Since the first release of the web package, this “session” has been based on the value of
domain_sessionid and has not been changeable. With version 0.16.0 we are removing this restriction.
For users who wish, you will be able to use not just any field (base atomic table or entities) in your events table, but you can even provide custom SQL to generate your “session” id, be it the concatenation of 2 fields, or even the truncated timestamp of the event.
While this is quite an advanced feature and use case, we believe that for those that need this it will open a whole new way to work with our web package. This is all powered by our new base macros which you can read more about below!
User Stitching in the web package now, as well as using the custom user identifier above, also allows you to select the field to use as your stitched user id in case you do not want to use the default
user_id . For example you may wish to use a field from a custom entity. For information about how to do this, see our User Stitching page and the web configuration variables.
Within the majority of our packages we use the same incremental sessionization logic, with perhaps slight changes in each. This means that, while rare, any changes to this logic are difficult to roll-out, we had a lot of duplication, and that the approach was quite strict which meant you weren’t able to alter things such as what the “session” identifier was or what timestamp the event table was filtered on based on the manifest table.
The culmination of months of work is a new set of macros that cleanly, yet flexibly, recreate the core session-based incremental logic of our packages. The macros enable a far greater choice when configuring the package, without having to make changes individually to each model.
While we are using these macros in the web package, which allows much of the new flexibility, they can also be used in your own projects to build a new “snowplow-type” package from scratch with the exact setup you need.
We also plan for the full release to produce a repo of demo projects that you can use to understand how to use these macros and configure them to work as you need them to.
Finally, thanks to the work done in those base macros, it is now possible (and easy!) to include custom entity or SDE fields in the
snowplow_web_base_events_this_run table, which will make it MUCH easier to build custom models that use these fields, or to use them in the passthrough feature described above.
This has been another feature that we have had requested multiple times, and without this work would have been complex for both us and the user to implement. The new base macro allowed us to do this in a flexible, yet controlled way.
With the release of RC1 today, unless we receive a lot of bugs or feedback we cannot action in time, we expect to be releasing a full version of these packages before the end of September. Please keep an eye out for that release post where we will also talk about what is next for our packages!
This was truly a huge team effort, with all our Analytics Engineers being involved from initial design discussions, to development, testing, reviewing, and releasing. Thanks!