Dbt model column meanings

Hi all,

We are working on utilizing the dbt Snowplow package. The data dictionary we are looking at is here: dictionary

We use kafka for our stream and we write into Snowflake with the Kafka connector which we use everywhere else we move data. All this works really well.

However, once our data gets into Snowflake, it doesn’t match the web model that dbt expects. We’d love some clarification on a few fields specified in the data dictionary linked above that don’t exist in our system.

snowplow_web_page_1 and yauaa_context_1:

  • root_id
  • root_tstamp
  • ref_root
  • ref_tree
  • ref_parent

What are the above fields referring to? Our data is completely flat in one table, so we don’t have separate tables for webpage_id, yauaa, and performance contexts.

Thanks!
Patrick

These fields only really exist in the shredded model for enriched data - which doesn’t really exist in the Snowflake single table, context-as-a-column schema. For other databases like Redshift where there is a core table (events) and context / event tables these fields (specifically root_id and root_tstamp) act as a join key between the core event and the contexts associated with it.

As Mike suggests, this looks like you’re looking at the Redshift/Postgres code in the web model. You should try to follow the Snowflake specific path through the model.

However, there’s likely still some differences between a direct load and data loaded with the Snowplow RDB Loader (for Snowflake), particularly around schema usage and how our loader builds those tables so it can migrate them as schemas evolve.

You might find this interesting reading too on how we recommend loading to Snowflake:

Thanks guys.

Is there documentation anywhere which describes what the source of the Snowflake table should look like? Columns and datatypes?

Or is the best way to dig through the dbt source and get it from there?

Thanks,
Patrick

For the columns and datatypes the base reference is here - but as it mentions this table will be mutated according to your custom events and entities. The documentation for these columns can be found in dbt docs (or as a direct reference here).

The dbt package, has its own dbt docs site which you might find useful here: dbt Docs