Snowplow/dbt-databricks - Schema hierarchy not being maintained

wambam · October 17, 2022, 7:26pm

Hello All,

We just started moving out DBT/Snowplow project to Databricks. Here is a sample of our profile config for connecting:

    test:
      catalog: null
      host: <redacted but databricks sql endpoint url>
      http_path: <redacted but databricks http path from the sql endpoint>
      schema: enriched_sandbox
      threads: 4
      token: <a pat token>
      type: databricks

What I am finding is that “enriched_sandox_derived”, “enriched_sandox_scratch”, etc. are all being created as schemas under the catalogue parallel to “enriched_sandox” when ideally they should be tables under the “enriched_sandox” schema. This appears to be a Snowplow issue as running the demo, dbt-databricks does not produce the same hierarchy issues. The only difference I can find between the two workflows, is that the demo does not contain snowplow_web or snowplow_utils models.

Any advice is appreciated, but I thought I would bring this to attention.

Thank you!

Ryan · October 17, 2022, 8:18pm

Hey,

The Snowplow DBT packages actually set up the schemas in the package project yaml file to append those suffixes to your profile schema, that’s why you’re seeing the behaviour you get and is expected.

You can overwrite the schemas the package writes to in your dbt_project.yml file by using the following:

# dbt_project.yml
...
models:
  snowplow_web:
    base: 
      manifest:
        +schema: my_manifest_schema
      scratch:
        +schema: my_scratch_schema
    sessions: 
      +schema: my_derived_schema
      scratch:
        +schema: my_scratch_schema
    user_mapping: 
      +schema: my_derived_schema
    users:
      +schema: my_derived_schema
      scratch:
        +schema: my_scratch_schema
    page_views: 
      +schema: my_derived_schema
      scratch:
        +schema: my_scratch_schema

And of course you can set them all the same if you wish, so in your case every +schema: line would have enriched_sandbox as the value.

For more info you can see our docs that I’ve linked below, but give that a try and let me know if that resolves your issue.

Hope that helps!

wambam · October 17, 2022, 8:19pm

@Ryan Thanks for this! I will give this a shot. Cheers, Michael

wambam · October 18, 2022, 6:27pm

@Ryan Thanks, but I don’t think that had the effect I wanted.

“By default all scratch/staging tables will be created in the <target.schema>_scratch schema, the derived tables ( e.g. snowplow_web_page_views , snowplow_web_sessions , snowplow_web_users ) will be created in <target.schema>_derived and all manifest tables in <target.schema>_snowplow_manifest”

I didn’t really want these tables to just have a different appended name, I wanted them to be under an existing schema/database in Databricks.

Sounds like this could be related too “three level namespaces” - Configuration | Snowplow Documentation

We are not on the Unity Catalog yet.

Thank you!

Ryan · October 19, 2022, 8:55am

@wambam Ah okay sorry, I see what the issue is now - if you set those all values to null (the keyword null, not an empty string otherwise you’ll get a trailing underscore) in the project yaml you should get it all writing to your pure enriched_sandbox schema. I have tested this on a UC but I don’t believe that should make a difference here as that’s more about finding the data in the first place.

Could you please try that and let me know if that works for you, if it does I’ll update the docs to make it more obvious how to achieve this for anyone in the future

# dbt_project.yml
...
models:
  snowplow_web:
    base: 
      manifest:
        +schema: null
      scratch:
        +schema: null
    sessions: 
      +schema: null
      scratch:
        +schema: null
    user_mapping: 
      +schema: null
    users:
      +schema: null
      scratch:
        +schema: null
    page_views: 
      +schema: null
      scratch:
        +schema: null

Topic		Replies	Views
Snowplow-web 0.8.0 dbt package released New releases	0	795	July 19, 2022
Snowplow-web 0.9.2 dbt package released New releases	0	793	September 7, 2022
Snowplow data modeling issues when updating schema SQL Runner	1	988	March 22, 2022
Snowplow-web 0.11.0 dbt package released New releases	0	708	December 7, 2022
Snowplow dbt Unified 0.4.2 released New releases	0	91	June 12, 2024

Snowplow/dbt-databricks - Schema hierarchy not being maintained

Related topics