We recently upgraded the snowplow-web dbt package from 0.12.4 to 0.16.2, and snowplow-utils to 0.15.2
The upgrade seems to have gone through without any errors, but since then the snowplow-web model fetches data only from the start_date and not the last successful run in the manifest.
## 1st command
dbt run --select snowplow_web --full-refresh --vars '{snowplow__allow_refresh: true, snowplow__backfill_limit_days: 2, snowplow__start_date: '2023-01-01'}'
Snowplow: No data in manifest. Processing data from start_date
+ Snowplow: Processing data between '2023-01-01 00:00:00' and '2023-01-03 00:00
## 2nd command
dbt run --select snowplow_web --vars '{snowplow__backfill_limit_days: 2, snowplow__start_date: '2023-01-01'}'
+ Snowplow: New Snowplow incremental model. Backfilling
+ Snowplow: Processing data between '2023-01-01 00:00:00' and '2023-01-02 23:59:59' (snowplow_web)
1 of 18 OK created sql incremental model snowplow_manifest.snowplow_web_base_quarantined_sessions [MERGE (0.0 rows, 0 processed) in 5.19s]
2 of 18 OK created sql incremental model snowplow_manifest.snowplow_web_incremental_manifest [MERGE (0.0 rows, 0 processed) in 5.21s]
3 of 18 START sql table model scratch.snowplow_web_base_new_event_limits ....... [RUN]
+ Snowplow: New Snowplow incremental model. Backfilling
+ Snowplow: Processing data between '2023-01-01 00:00:00' and '2023-01-02 23:59:59' (snowplow_web)
## Retrying
dbt run --select snowplow_web --vars '{snowplow__backfill_limit_days: 2}'
+ Snowplow: New Snowplow incremental model. Backfilling
+ Snowplow: Processing data between '2023-01-01 00:00:00' and '2023-01-02 23:59:59' (snowplow_web)
1 of 18 OK created sql incremental model snowplow_manifest.snowplow_web_base_quarantined_sessions [MERGE (0.0 rows, 0 processed) in 5.19s]
2 of 18 OK created sql incremental model snowplow_manifest.snowplow_web_incremental_manifest [MERGE (0.0 rows, 0 processed) in 5.21s]
3 of 18 START sql table model scratch.snowplow_web_base_new_event_limits ....... [RUN]
+ Snowplow: New Snowplow incremental model. Backfilling
+ Snowplow: Processing data between '2023-01-01 00:00:00' and '2023-01-02 23:59:59' (snowplow_web)
have you enabled any of the optional modules
Yes, please see below
have you added the dispatch command to your project yaml
Yes
can you share the full output of a single run?
$ dbt run --select snowplow_web --vars '{snowplow__backfill_limit_days: 2, snowplow__start_date: '2023-01-01'}'
02:30:40 Running with dbt=1.5.7
02:30:41 Registered adapter: bigquery=1.5.3
02:30:41 Unable to do partial parsing because config vars, config profile, or config target have changed
02:30:54 Found 578 models, 380 tests, 0 snapshots, 6 analyses, 1467 macros, 7 operations, 18 seed files, 237 sources, 0 exposures, 0 metrics, 0 groups
02:30:54
02:31:00
02:31:00 Running 4 on-run-start hooks
02:31:00 1 of 4 START hook: lux_analytics.on-run-start.0 ................................ [RUN]
02:31:03 1 of 4 OK hook: lux_analytics.on-run-start.0 ................................... [SCRIPT (0 processed) in 3.19s]
02:31:03 2 of 4 START hook: elementary.on-run-start.0 ................................... [RUN]
02:31:03 2 of 4 OK hook: elementary.on-run-start.0 ...................................... [OK in 0.00s]
02:31:03 3 of 4 START hook: snowplow_mobile.on-run-start.0 .............................. [RUN]
02:31:03 3 of 4 OK hook: snowplow_mobile.on-run-start.0 ................................. [OK in 0.00s]
02:31:03 4 of 4 START hook: snowplow_web.on-run-start.0 ................................. [RUN]
02:31:03 4 of 4 OK hook: snowplow_web.on-run-start.0 .................................... [OK in 0.00s]
02:31:03
02:31:03 Concurrency: 4 threads (target='dev')
02:31:03
02:31:03 1 of 18 START sql incremental model snowplow_manifest.snowplow_web_base_quarantined_sessions [RUN]
02:31:03 2 of 18 START sql incremental model snowplow_manifest.snowplow_web_incremental_manifest [RUN]
02:31:08 1 of 18 OK created sql incremental model snowplow_manifest.snowplow_web_base_quarantined_sessions [MERGE (0.0 rows, 0 processed) in 5.43s]
02:31:08 2 of 18 OK created sql incremental model snowplow_manifest.snowplow_web_incremental_manifest [MERGE (0.0 rows, 0 processed) in 5.46s]
02:31:08 3 of 18 START sql table model scratch.snowplow_web_base_new_event_limits ....... [RUN]
02:31:10 02:31:10 + Snowplow: New Snowplow incremental model. Backfilling
02:31:16 02:31:16 + Snowplow: Processing data between '2023-01-01 00:00:00' and '2023-01-02 23:59:59' (snowplow_web)
02:31:18 3 of 18 OK created sql table model scratch.snowplow_web_base_new_event_limits .. [CREATE TABLE (1.0 rows, 0 processed) in 9.90s]
02:31:18 4 of 18 START sql incremental model snowplow_manifest.snowplow_web_base_sessions_lifecycle_manifest [RUN]
02:31:41 4 of 18 OK created sql incremental model snowplow_manifest.snowplow_web_base_sessions_lifecycle_manifest [MERGE (464.1k rows, 751.9 MiB processed) in 22.67s]
02:31:41 5 of 18 START sql table model scratch.snowplow_web_base_sessions_this_run ...... [RUN]
02:31:59 5 of 18 OK created sql table model scratch.snowplow_web_base_sessions_this_run . [CREATE TABLE (464.1k rows, 40.5 MiB processed) in 18.60s]
02:31:59 6 of 18 START sql table model scratch.snowplow_web_base_events_this_run ........ [RUN]
02:32:31 6 of 18 OK created sql table model scratch.snowplow_web_base_events_this_run ... [CREATE TABLE (5.7m rows, 16.8 GiB processed) in 31.93s]
02:32:31 7 of 18 START sql table model scratch.snowplow_web_pv_engaged_time ............. [RUN]
02:32:31 8 of 18 START sql table model scratch.snowplow_web_pv_scroll_depth ............. [RUN]
02:32:31 9 of 18 START sql table model scratch.snowplow_web_sessions_this_run ........... [RUN]
02:32:31 10 of 18 START sql incremental model derived.snowplow_web_user_mapping ......... [RUN]
02:32:38 7 of 18 OK created sql table model scratch.snowplow_web_pv_engaged_time ........ [CREATE TABLE (0.0 rows, 626.9 MiB processed) in 6.72s]
02:32:40 8 of 18 OK created sql table model scratch.snowplow_web_pv_scroll_depth ........ [CREATE TABLE (1.4m rows, 713.4 MiB processed) in 8.15s]
02:32:40 11 of 18 START sql table model scratch.snowplow_web_page_views_this_run ........ [RUN]
02:32:40 10 of 18 OK created sql incremental model derived.snowplow_web_user_mapping .... [MERGE (0.0 rows, 350.1 MiB processed) in 8.90s]
02:32:53 11 of 18 OK created sql table model scratch.snowplow_web_page_views_this_run ... [CREATE TABLE (1.4m rows, 7.8 GiB processed) in 13.37s]
02:32:53 12 of 18 START sql incremental model derived.snowplow_web_page_views ........... [RUN]
02:33:01 9 of 18 OK created sql table model scratch.snowplow_web_sessions_this_run ...... [CREATE TABLE (457.7k rows, 7.5 GiB processed) in 29.23s]
02:33:01 13 of 18 START sql incremental model derived.snowplow_web_sessions ............. [RUN]
02:33:08 12 of 18 OK created sql incremental model derived.snowplow_web_page_views ...... [MERGE (0.0 rows, 0 processed) in 14.76s]
02:33:31 13 of 18 OK created sql incremental model derived.snowplow_web_sessions ........ [MERGE (0.0 rows, 0 processed) in 30.41s]
02:33:31 14 of 18 START sql table model scratch.snowplow_web_users_sessions_this_run .... [RUN]
02:33:59 14 of 18 OK created sql table model scratch.snowplow_web_users_sessions_this_run [CREATE TABLE (457.7k rows, 863.3 MiB processed) in 27.85s]
02:33:59 15 of 18 START sql table model scratch.snowplow_web_users_aggs ................. [RUN]
02:34:13 15 of 18 OK created sql table model scratch.snowplow_web_users_aggs ............ [CREATE TABLE (373.4k rows, 53.9 MiB processed) in 14.11s]
02:34:13 16 of 18 START sql table model scratch.snowplow_web_users_lasts ................ [RUN]
02:34:25 16 of 18 OK created sql table model scratch.snowplow_web_users_lasts ........... [CREATE TABLE (373.4k rows, 305.1 MiB processed) in 11.84s]
02:34:25 17 of 18 START sql table model scratch.snowplow_web_users_this_run ............. [RUN]
02:34:42 17 of 18 OK created sql table model scratch.snowplow_web_users_this_run ........ [CREATE TABLE (373.4k rows, 659.4 MiB processed) in 16.57s]
02:34:42 18 of 18 START sql incremental model derived.snowplow_web_users ................ [RUN]
02:34:55 18 of 18 OK created sql incremental model derived.snowplow_web_users ........... [MERGE (0.0 rows, 0 processed) in 13.94s]
02:34:56
02:34:56 Running 3 on-run-end hooks
02:35:11 1 of 3 START hook: elementary.on-run-end.0 ..................................... [RUN]
02:35:11 1 of 3 OK hook: elementary.on-run-end.0 ........................................ [OK in 0.00s]
02:35:11 2 of 3 START hook: snowplow_mobile.on-run-end.0 ................................ [RUN]
02:35:11 2 of 3 OK hook: snowplow_mobile.on-run-end.0 ................................... [OK in 0.00s]
02:35:12 3 of 3 START hook: snowplow_web.on-run-end.0 ................................... [RUN]
02:35:15 3 of 3 OK hook: snowplow_web.on-run-end.0 ...................................... [MERGE (12.0 rows, 43.3 MiB processed) in 3.10s]
02:35:15
02:35:15
02:35:15 Finished running 7 incremental models, 11 table models, 7 hooks in 0 hours 4 minutes and 20.92 seconds (260.92s).
02:35:15
02:35:15 Completed successfully
02:35:15
02:35:15 Done. PASS=18 WARN=0 ERROR=0 SKIP=0 TOTAL=18
can you also share your project yaml? Specifically any vars or model configs for Snowplow web please
Ah my guess is that model is tagged correctly, so it gets picked up when we check if it’s in the manifest, but that it isn’t running from the select because it’s not in the package. You could either adjust your select to be based on the tag, or use our pre-built selectors (https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-operation/model-selection/#yaml-selectors), you’ll just have to copy them out of the package because dbt doesn’t support selectors in packages
So, our custom model is tagged tags=["snowplow_web_incremental","snowplow_mobile_incremental"] in the config, but it does not seem to get picked up if we run --select snowplow_web which I thought technically should run because in the selectors we have
selectors:
- name: snowplow_web
# Description field added dbt v0.19. Commenting out for compatibility.
# description: >
# Suggested node selection when running the Snowplow Web package.
# Runs:
# - All Snowplow Web models.
# - All custom models in your dbt project, tagged with `snowplow_web_incremental`.
definition:
union:
- method: package
value: snowplow_web
- method: tag
value: snowplow_web_incremental
Yep that’ll be it. I’m not quite sure that it will work as intended being tagged as both though - that will cause the table to be run in both packages, which may lead to some historic overwrites, but if it works for you then great!