My team is prototyping a solution using the snowplow-web dbt package (v0.12.4), and we’re having an issue with a test failing.
It’s a vanilla snowplow set-up, into Postgres, with no extensions enabled. The only defaults we’ve overridden are the start date and the backfill window.
There is a not_null constraint on domain_sessionidx, which can be NULL (and is so for some records in the source data, where the devs were getting the tracker set up correctly).
Should domain_sessionidx being NULL break the workflow, or is this test too strict?
First of all, I’d recommend using a more recent version of the web package if you can as it will contain new features and fixes, especially if you’re just starting now.
In general we would expect the domain_sessionidx to not be null from the tracking side for web data, however most of our tests are “best guess” that cover the majority of scenarios, but in certain use cases of Snowplow it might not be a valid test, we suggest in those cases you can disable the tests from our package and run your own on top of the tables if you wish.
Thanks - I’ll speak to the team about bumping the version number.
What seems to be making people nervous is that the dbt package will build successfully for some days where domain_sessionidx is NULL without issue, then fail on processing subsequent days.
It may be some other problem being masked by this… perhaps I need to confirm with these specific data and use cases that domain_sessionidx being NULL is not a problem in itself, and that the test can be safely disabled.
From memory the tests only exist (or at least mostly) exist on the derived tables, so even though individual events may have a NULL value, the event(s) that is used to populate the derived tables maybe have a value for this.
I agree with your approach though, understand where these events are coming from and if it’s a tracking issue or if they are valid first, then make a decision.