We are happy to announce the release of snowplow-web 0.15.0, which is a huge release that contains plenty of new features and optimizations, so make sure to give this post a read over. The usual summary and changelog can be found at the bottom.
New Fields
This release adds a load of new fields across our page view, sessions, and users table to make it easier to derive insight from your modelled data. These fields include:
- Human readable language and location information (sessions and users)
- Marketing source platform, and platform (sessions)
- Screen resolution and device category (page views and sessions)
- Last_X location and language information (sessions and users)
- Content group for page views (customisable by a macro)
- Total count and JSON-type counts per event type (sessions)
- Default channel group (sessions and users)
All these fields (except counts per event type, which is enabled by a variable) will show up on the next run of the package after upgrading.
Core Web Vitals
A new optional module now exist in the package, Core Web Vitals, which will allow you to model raw events sent from the Snowplow Web Vitals plugin (@snowplow/browser-plugin-web-vitals) to measure and evaluate the speed, responsiveness, and visual stability of websites.
This module produces two output tables; one to store the cleaned events, and another to use for reporting and alerting based on percentile thresholds. You can read more about it in our docs, and very soon you will find a new Accelerator about it here to provide more details and visualisations!
Conversion Modeling
We have added support for modelling your conversion-type events into the sessions table as part of this release. What defines a conversion? You decide! Any single-event condition can be provided and you can model as many as you want. We aggregate count, first timestamp, list of event_ids, value (based on a field you specify), and a boolean field for the conversion. We also generate columns across all conversions to total the volume and value. You can read more about it, including how to set it up, in our docs.
Other changes
We’ve made some optimizations to one of our user tables, ensuring we have deterministic ordering throughout the package, and moved enabling the consent models behind a variable.
Finally, we’ve started adding some more interactivity into our package docs, one example is the conversions page above, but you can now also generate your project variables right on the configuration page, check it out!
Summary
The main change in this version is the addition of many new fields to the derived tables (sessions
, pageviews
, users
), including the ability to define your conversion events that are aggregated per session. You can read more about it in our docs here! We also add the (optional) count of events per event type for each session, you can enable it by turning on the snowplow__list_event_counts
variable.
There is a new optional module called core web vitals
which will allow you to model raw events sent from the Snowplow Web Vitals plugin (@snowplow/browser-plugin-web-vitals) of the JavaScript tracker to measure and evaluate the speed, responsiveness, and visual stability of websites.
This version also moves the iab, ua, and yauaa contexts into the base events this run table for Postgres/Redshift, completing the work to decouple our sessions models from the page view models for all warehouses.
Breaking Changes
Please note that we have added a variable, snowplow__enable_consent
to enable the models in the consent module. Please make sure you set this variable to true in your dbt_project.yml file to be able to run them:
# dbt_project.yml
vars:
snowplow_web:
snowplow__enable_consent: true
While not breaking in that your models should all still run, there are many new columns in the sessions, pageviews and users tables that will be automatically added to your derived tables, which may have unintended consequences with existing queries or BI tools that use column position (although dbt usually appends these columns to the end). New columns will only be populated from new runs onwards, if you wish to populate them for older records you will need to do a full refresh.
Features
- Move contexts into base table for redshift (Close #185)
- Add support for user-defined conversion aggregation to sessions
- Add new fields to the sessions table including enhanced and last geo data, default channel group, marketing source platform, event count, and optional count by event_name
- Add new fields to the users table relating to the latest event geo and browser information
- Add core web vitals module
Under the hood
- Optimizes a filter in the
snowplow_web_user_sessions_this_run
model (Close #186) - Ensure deterministic modeling when two genuine events occur at the same time in a session (Close #178)
Docs
- Remove docs that are no longer necessary
Upgrading
Bump the snowplow-web version in your packages.yml
file.
Some of the tables are relying now on seeds that have just been introduced with this version (dim_ga4_source_categories
, dim_geo_country_mapping
,dim_rfc_5646_language_mapping
). Please run dbt seed --select snowplow_web --full-refresh
on your first run.