Snowplow_web dbt package not producing rows in database

CSlovak · June 23, 2022, 7:04pm

We are using dbt Cloud for hosting our installation and have setup and can run the snowplow_web package successfully. The builds pass and the supporting tables (derived, manifest, scratch) all are created with their subsequent schemas, however none of the tables are populating. The package continues to produce zero rows.

We are running BigQuery.

Here is our project yml entry for the package:

Snowplow

vars:
snowplow_web:
snowplow__atomic_schema: datalake
snowplow__database: challenger-dev-231320
snowplow__events: datalake.snowplow_events
snowplow__enable_iab: false
snowplow__enable_ua: false
snowplow__enable_yauaa: false
snowplow__derived_tstamp_partitioned: false

CSlovak · June 23, 2022, 7:07pm

realized I should add more information

packages:

package: snowplow/snowplow_web
version: 0.6.2
package: snowplow/snowplow_utils
version: 0.9.0

Emiel · June 24, 2022, 9:47am

Hey @CSlovak, welcome to the Snowplow community!

I think what might be happening is that you didn’t set the snowplow__start_date value which means the web package is trying to process data from the default starting date (which is 2020-01-01). It also looks like you have the default value for snowplow__backfill_limit_days which is 30, this therefore means that the web package will be looking in the date range of 2020-01-01 to 2020-01-31 for data to process. Since there is (presumably) no data yet for this date range, the web package creates empty tables but does not update it’s manifest to say that it has processed data in this date range, since no actual data was processed. As a result, on the next run the web package once again searches this date range for data to process, and again finds nothing. To immediately resolve this problem (if I understand it correctly), you’ll need to update your snowplow__start_date value to something a bit more recent, when you first started generating data in your events table.

If you’re interested, here’s an explanation of how our package works at a high level and why it runs into this problem. We use a series of macros to generate and maintain a manifest table, which essentially keeps track of each “actual” table that the web package generates (in the scratch and derived schemas) and what the latest timestamp is of data processed for that table. This allows us to very easily “catch-up” in case parts of a dbt run fail during any run, and also ensures that without changing any parameters in the dbt project, the web tables will remain as up to date as possible with every run (assuming you run it frequently enough – which in the default is more frequently than every 30 days). However, when there is no data in the source tables to process, our manifest does not update in order to allow for the data to be loaded into the events table in case it is late-arriving.

I hope this clarifies things and helps resolve your issue, but if it doesn’t or if you have any more questions don’t hesitate to let me know!

Have a great day,
Emiel

CSlovak · July 1, 2022, 10:16pm

Emiel - Sorry for the slow response, but I had some issues with the dbt cloud instance that we were running that we just resolved and I was able to try this fix. Happy to saw it worked perfectly! Thank you.

Emiel · July 1, 2022, 10:36pm

Glad to hear it Chris! Please don’t hesitate to let us know if you run into any other issues, we will also make the step of setting the start date more explicit in our documentation as well to avoid confusion for new users!

Have a great weekend,
Emiel

Topic		Replies	Views
Dbt snowplow_web starts always from `start_date` For data modelers & consumers	3	480	November 30, 2023
Snowplow Incremental models not working on upgrade For engineers	7	600	November 20, 2023
Having some problems running Snowplow Web Model in dbtCloud For data modelers & consumers	22	2900	September 26, 2022
Snowplow-web 0.2.0 dbt package released New releases	0	1088	August 20, 2021
Snowplow-web 0.11.0 dbt package released New releases	0	708	December 7, 2022

Snowplow_web dbt package not producing rows in database

Snowplow

Related topics