Dbt model column meanings

pcb · August 3, 2022, 11:01pm

Hi all,

We are working on utilizing the dbt Snowplow package. The data dictionary we are looking at is here: dictionary

We use kafka for our stream and we write into Snowflake with the Kafka connector which we use everywhere else we move data. All this works really well.

However, once our data gets into Snowflake, it doesn’t match the web model that dbt expects. We’d love some clarification on a few fields specified in the data dictionary linked above that don’t exist in our system.

snowplow_web_page_1 and yauaa_context_1:

root_id
root_tstamp
ref_root
ref_tree
ref_parent

What are the above fields referring to? Our data is completely flat in one table, so we don’t have separate tables for webpage_id, yauaa, and performance contexts.

Thanks!
Patrick

mike · August 4, 2022, 3:56am

These fields only really exist in the shredded model for enriched data - which doesn’t really exist in the Snowflake single table, context-as-a-column schema. For other databases like Redshift where there is a core table (events) and context / event tables these fields (specifically root_id and root_tstamp) act as a join key between the core event and the contexts associated with it.

PaulBoocock · August 4, 2022, 6:23am

As Mike suggests, this looks like you’re looking at the Redshift/Postgres code in the web model. You should try to follow the Snowflake specific path through the model.

However, there’s likely still some differences between a direct load and data loaded with the Snowplow RDB Loader (for Snowflake), particularly around schema usage and how our loader builds those tables so it can migrate them as schemas evolve.

You might find this interesting reading too on how we recommend loading to Snowflake:

pcb · August 4, 2022, 4:01pm

Thanks guys.

Is there documentation anywhere which describes what the source of the Snowflake table should look like? Columns and datatypes?

Or is the best way to dig through the dbt source and get it from there?

Thanks,
Patrick

mike · August 4, 2022, 11:01pm

For the columns and datatypes the base reference is here - but as it mentions this table will be mutated according to your custom events and entities. The documentation for these columns can be found in dbt docs (or as a direct reference here).

PaulBoocock · August 5, 2022, 8:07am

The dbt package, has its own dbt docs site which you might find useful here: dbt Docs

Topic		Replies	Views
DBT and new data model in Snowplow Snowflake	5	1870	July 24, 2021
Dbt_snowplow_web: Adding context data	3	789	June 16, 2023
Data modeling in real time	8	1448	September 4, 2023
Modeling, Deduplication and Architectures Redshift	1	1430	February 25, 2018
Snowplow-web 0.13.1 dbt package released New releases	0	695	February 24, 2023

Dbt model column meanings

Related topics