We are very excited to announce the release of BigQuery Web model v1 . This is the second of a series of releases intended to address a hugely important need for Snowplow - extensible, scalable, incremental data modeling.
Improving the Modeling Experience
As described in the Redshift v1 release, these models aim to solve the key problems of modeling Snowplow data by providing a Snowplow-maintained incremental logic, and allowing users to customise their logic in a more maintainable and more straightforward way.
What the new model brings
The v1 release of the web model is designed to implement a SQL-as-software structure:
- We establish core modules which can be thought of as source code
- Each module has an explicit input and output (each module also has side-effects - this is unavoidable)
- Each module has an ‘entry point’ for custom logic, which can be treated as a plugin
- Each module is testable in isolation
- Tests can be extended to custom modules
This structure allows us to segregate the ‘heavy lifting’ of an incremental Snowplow module - by extrapolating the incremental logic into its own ‘base’ module. The base module produces a table which contains only events relevant to this run of the incremental logic - both the new and those that require recomputing (because relevant events have arrived - think of a late arriving page ping event).
This removes the complexity from customisation - all subsequent logic can operate on this input, as if it were a simple drop-and-recompute model, but the mode’s structure ensures an efficient incremental update. This means that the end user need only be concerned with the aggregation logic they care about, rather than expending effort on how to make that logic work within a complex structure.
Additional features introduced
- Users can take advantage of the
commit_table
stored procedure to create and update custom tables, without needing to manage table definitions or migrations - Tests improved
- Helper scripts improved
- Configs introduced (Snowplow Insights customers can use configs directly on Orchestration - Open Source users should instrument their own dependency management)
- Introduces script functionality to use configs to produce ‘pure’ SQL files, for those that wish to run the models using some other tool than SQL-Runner.
More information
Check out the v1 README of the repo for more detail on the structure.
For a quickstart guide, see the BigQuery README.