DBT and new data model in Snowplow

I have been using DBT tools for a long time in the same stack with Snowplow. Luckily DBT team had adopted snowplow data model in the DBT plugin: GitHub - fishtown-analytics/snowplow: Data models for snowplow analytics.

However, it is not developed further. I have faced with new snowplow data model in Snowflake loader and have adopted a little bit the DBT pluging. I was looking to add my adoption into plugin repo (Support new snowplow data model · Issue #102 · fishtown-analytics/snowplow · GitHub), but it was not agreed by DBT team.

They notify me Snowplow dev team are going to release their own DBT plugin supporting new data model. So does anyone has information about Snowplow plugin and the data it will be released? Current version of snowplow plugin is outdated and it does not allow to update DBT core module and other plugins. I have found GitHub - snowplow/data-models: Snowplow offically-supported SQL data models for working with Snowplow web and mobile behavioral data. Supports Redshift, Snowflake and BigQuery. it has some kind of bash-sql scripts and manual amendments, but it is quite ugly to use.

The Fishtown models are quite out of date so I’d leave them for the moment.

re: current data models - there are SQL Runner models available which you can use if you run SQL runner but these won’t work out of the box with dbt. I know that dbt models are currently in development (as of a few days ago). I don’t know the release date for these but I suspect it’ll be within the next quarter or so.

Just to provide a bit more detail on the new dbt Snowplow web package:

  • The beta version should be released within the next couple of weeks.
  • This beta version will initially only support Redshift.
  • BigQuery and Snowflake adapter support will follow over the next couple of months.
    • The package will support the parsing of json context and unstructured event fields such as page_view_id from the contexts_com_snowplowanalytics_snowplow_web_page_1 context.
  • The package is fully incremental, making use of a manifest table to understand which events have been processed already.
  • It is designed in a modular fashion, allowing you to easily add your own custom SQL while leveraging the incremental framework provided by the package.
1 Like

Is this a manifest of all events that have been processed or more akin to a time based watermark out of interest?

Good question - the latter. The manifest table contains one row per model/node from the Snowplow Web dbt package. Every time the package is run we update the manifest with the max collector_tstamp that the given model has consumed. We can then use this information from the manifest to determine the lower and upper timestamp limits of events to process during the next run of the package. This approach helps keep all incremental models within the package in sync, as well as reduce the need to query potentially large tables such as page views to determine the last event processed.

@mike @Will Looks like new model development will be available not so soon. I going to use sql-runner with sql scripts meanwhile.

What I see: old data model and new one approach has different number of tables. Similar table has difference in columns count and naming. Is there any approach how could I transfer old data created over old data model into the new one? Migration is quite critical to avoid losing old collected data.