Snowplow Incremental models not working on upgrade

We recently upgraded the snowplow-web dbt package from 0.12.4 to 0.16.2, and snowplow-utils to 0.15.2
The upgrade seems to have gone through without any errors, but since then the snowplow-web model fetches data only from the start_date and not the last successful run in the manifest.

## 1st command
dbt run --select snowplow_web --full-refresh --vars '{snowplow__allow_refresh: true, snowplow__backfill_limit_days: 2, snowplow__start_date: '2023-01-01'}'

Snowplow: No data in manifest. Processing data from start_date
 + Snowplow: Processing data between '2023-01-01 00:00:00' and '2023-01-03 00:00

## 2nd command
dbt run --select snowplow_web --vars '{snowplow__backfill_limit_days: 2, snowplow__start_date: '2023-01-01'}'

 + Snowplow: New Snowplow incremental model. Backfilling
 + Snowplow: Processing data between '2023-01-01 00:00:00' and '2023-01-02 23:59:59' (snowplow_web)

1 of 18 OK created sql incremental model snowplow_manifest.snowplow_web_base_quarantined_sessions  [MERGE (0.0 rows, 0 processed) in 5.19s]
2 of 18 OK created sql incremental model snowplow_manifest.snowplow_web_incremental_manifest  [MERGE (0.0 rows, 0 processed) in 5.21s]
3 of 18 START sql table model scratch.snowplow_web_base_new_event_limits ....... [RUN]
 + Snowplow: New Snowplow incremental model. Backfilling
 + Snowplow: Processing data between '2023-01-01 00:00:00' and '2023-01-02 23:59:59' (snowplow_web)

## Retrying
dbt run --select snowplow_web --vars '{snowplow__backfill_limit_days: 2}'

 + Snowplow: New Snowplow incremental model. Backfilling
 + Snowplow: Processing data between '2023-01-01 00:00:00' and '2023-01-02 23:59:59' (snowplow_web)

1 of 18 OK created sql incremental model snowplow_manifest.snowplow_web_base_quarantined_sessions  [MERGE (0.0 rows, 0 processed) in 5.19s]
2 of 18 OK created sql incremental model snowplow_manifest.snowplow_web_incremental_manifest  [MERGE (0.0 rows, 0 processed) in 5.21s]
3 of 18 START sql table model scratch.snowplow_web_base_new_event_limits ....... [RUN]
 + Snowplow: New Snowplow incremental model. Backfilling
 + Snowplow: Processing data between '2023-01-01 00:00:00' and '2023-01-02 23:59:59' (snowplow_web)

manifest table

Seems to imply there is a model that it thinks didn’t successfully complete in the last run and is out of sync with the rest.

Can you let me know:

  • do you have any custom models
  • have you enabled any of the optional modules
  • have you added the dispatch command to your project yaml
  • can you share the full output of a single run?
  • can you also share your project yaml? Specifically any vars or model configs for Snowplow web please

Hopefully then I should be able to work out what’s happening, thanks.

  • do you have any custom models
    No
  • have you enabled any of the optional modules
    Yes, please see below
  • have you added the dispatch command to your project yaml
    Yes
  • can you share the full output of a single run?
$ dbt run --select snowplow_web --vars '{snowplow__backfill_limit_days: 2, snowplow__start_date: '2023-01-01'}'
02:30:40  Running with dbt=1.5.7
02:30:41  Registered adapter: bigquery=1.5.3
02:30:41  Unable to do partial parsing because config vars, config profile, or config target have changed
02:30:54  Found 578 models, 380 tests, 0 snapshots, 6 analyses, 1467 macros, 7 operations, 18 seed files, 237 sources, 0 exposures, 0 metrics, 0 groups
02:30:54  
02:31:00  
02:31:00  Running 4 on-run-start hooks
02:31:00  1 of 4 START hook: lux_analytics.on-run-start.0 ................................ [RUN]
02:31:03  1 of 4 OK hook: lux_analytics.on-run-start.0 ................................... [SCRIPT (0 processed) in 3.19s]
02:31:03  2 of 4 START hook: elementary.on-run-start.0 ................................... [RUN]
02:31:03  2 of 4 OK hook: elementary.on-run-start.0 ...................................... [OK in 0.00s]
02:31:03  3 of 4 START hook: snowplow_mobile.on-run-start.0 .............................. [RUN]
02:31:03  3 of 4 OK hook: snowplow_mobile.on-run-start.0 ................................. [OK in 0.00s]
02:31:03  4 of 4 START hook: snowplow_web.on-run-start.0 ................................. [RUN]
02:31:03  4 of 4 OK hook: snowplow_web.on-run-start.0 .................................... [OK in 0.00s]
02:31:03  
02:31:03  Concurrency: 4 threads (target='dev')
02:31:03  
02:31:03  1 of 18 START sql incremental model snowplow_manifest.snowplow_web_base_quarantined_sessions  [RUN]
02:31:03  2 of 18 START sql incremental model snowplow_manifest.snowplow_web_incremental_manifest  [RUN]
02:31:08  1 of 18 OK created sql incremental model snowplow_manifest.snowplow_web_base_quarantined_sessions  [MERGE (0.0 rows, 0 processed) in 5.43s]
02:31:08  2 of 18 OK created sql incremental model snowplow_manifest.snowplow_web_incremental_manifest  [MERGE (0.0 rows, 0 processed) in 5.46s]
02:31:08  3 of 18 START sql table model scratch.snowplow_web_base_new_event_limits ....... [RUN]
02:31:10  02:31:10 + Snowplow: New Snowplow incremental model. Backfilling
02:31:16  02:31:16 + Snowplow: Processing data between '2023-01-01 00:00:00' and '2023-01-02 23:59:59' (snowplow_web)
02:31:18  3 of 18 OK created sql table model scratch.snowplow_web_base_new_event_limits .. [CREATE TABLE (1.0 rows, 0 processed) in 9.90s]
02:31:18  4 of 18 START sql incremental model snowplow_manifest.snowplow_web_base_sessions_lifecycle_manifest  [RUN]
02:31:41  4 of 18 OK created sql incremental model snowplow_manifest.snowplow_web_base_sessions_lifecycle_manifest  [MERGE (464.1k rows, 751.9 MiB processed) in 22.67s]
02:31:41  5 of 18 START sql table model scratch.snowplow_web_base_sessions_this_run ...... [RUN]
02:31:59  5 of 18 OK created sql table model scratch.snowplow_web_base_sessions_this_run . [CREATE TABLE (464.1k rows, 40.5 MiB processed) in 18.60s]
02:31:59  6 of 18 START sql table model scratch.snowplow_web_base_events_this_run ........ [RUN]
02:32:31  6 of 18 OK created sql table model scratch.snowplow_web_base_events_this_run ... [CREATE TABLE (5.7m rows, 16.8 GiB processed) in 31.93s]
02:32:31  7 of 18 START sql table model scratch.snowplow_web_pv_engaged_time ............. [RUN]
02:32:31  8 of 18 START sql table model scratch.snowplow_web_pv_scroll_depth ............. [RUN]
02:32:31  9 of 18 START sql table model scratch.snowplow_web_sessions_this_run ........... [RUN]
02:32:31  10 of 18 START sql incremental model derived.snowplow_web_user_mapping ......... [RUN]
02:32:38  7 of 18 OK created sql table model scratch.snowplow_web_pv_engaged_time ........ [CREATE TABLE (0.0 rows, 626.9 MiB processed) in 6.72s]
02:32:40  8 of 18 OK created sql table model scratch.snowplow_web_pv_scroll_depth ........ [CREATE TABLE (1.4m rows, 713.4 MiB processed) in 8.15s]
02:32:40  11 of 18 START sql table model scratch.snowplow_web_page_views_this_run ........ [RUN]
02:32:40  10 of 18 OK created sql incremental model derived.snowplow_web_user_mapping .... [MERGE (0.0 rows, 350.1 MiB processed) in 8.90s]
02:32:53  11 of 18 OK created sql table model scratch.snowplow_web_page_views_this_run ... [CREATE TABLE (1.4m rows, 7.8 GiB processed) in 13.37s]
02:32:53  12 of 18 START sql incremental model derived.snowplow_web_page_views ........... [RUN]
02:33:01  9 of 18 OK created sql table model scratch.snowplow_web_sessions_this_run ...... [CREATE TABLE (457.7k rows, 7.5 GiB processed) in 29.23s]
02:33:01  13 of 18 START sql incremental model derived.snowplow_web_sessions ............. [RUN]
02:33:08  12 of 18 OK created sql incremental model derived.snowplow_web_page_views ...... [MERGE (0.0 rows, 0 processed) in 14.76s]
02:33:31  13 of 18 OK created sql incremental model derived.snowplow_web_sessions ........ [MERGE (0.0 rows, 0 processed) in 30.41s]
02:33:31  14 of 18 START sql table model scratch.snowplow_web_users_sessions_this_run .... [RUN]
02:33:59  14 of 18 OK created sql table model scratch.snowplow_web_users_sessions_this_run  [CREATE TABLE (457.7k rows, 863.3 MiB processed) in 27.85s]
02:33:59  15 of 18 START sql table model scratch.snowplow_web_users_aggs ................. [RUN]
02:34:13  15 of 18 OK created sql table model scratch.snowplow_web_users_aggs ............ [CREATE TABLE (373.4k rows, 53.9 MiB processed) in 14.11s]
02:34:13  16 of 18 START sql table model scratch.snowplow_web_users_lasts ................ [RUN]
02:34:25  16 of 18 OK created sql table model scratch.snowplow_web_users_lasts ........... [CREATE TABLE (373.4k rows, 305.1 MiB processed) in 11.84s]
02:34:25  17 of 18 START sql table model scratch.snowplow_web_users_this_run ............. [RUN]
02:34:42  17 of 18 OK created sql table model scratch.snowplow_web_users_this_run ........ [CREATE TABLE (373.4k rows, 659.4 MiB processed) in 16.57s]
02:34:42  18 of 18 START sql incremental model derived.snowplow_web_users ................ [RUN]
02:34:55  18 of 18 OK created sql incremental model derived.snowplow_web_users ........... [MERGE (0.0 rows, 0 processed) in 13.94s]
02:34:56  
02:34:56  Running 3 on-run-end hooks
02:35:11  1 of 3 START hook: elementary.on-run-end.0 ..................................... [RUN]
02:35:11  1 of 3 OK hook: elementary.on-run-end.0 ........................................ [OK in 0.00s]
02:35:11  2 of 3 START hook: snowplow_mobile.on-run-end.0 ................................ [RUN]
02:35:11  2 of 3 OK hook: snowplow_mobile.on-run-end.0 ................................... [OK in 0.00s]
02:35:12  3 of 3 START hook: snowplow_web.on-run-end.0 ................................... [RUN]
02:35:15  3 of 3 OK hook: snowplow_web.on-run-end.0 ...................................... [MERGE (12.0 rows, 43.3 MiB processed) in 3.10s]
02:35:15  
02:35:15  
02:35:15  Finished running 7 incremental models, 11 table models, 7 hooks in 0 hours 4 minutes and 20.92 seconds (260.92s).
02:35:15  
02:35:15  Completed successfully
02:35:15  
02:35:15  Done. PASS=18 WARN=0 ERROR=0 SKIP=0 TOTAL=18
  • can you also share your project yaml? Specifically any vars or model configs for Snowplow web please
require-dbt-version: [">=1.5.0", "<2.0.0"]

dispatch:
  - macro_namespace: dbt
    search_order: ['snowplow_utils', 'dbt']

vars:
  batch_id: -1 # overwritten at runtime by airflow task_instance_id
  'dbt_date:time_zone': 'Australia/Sydney'
  snowplow_web:
    snowplow__atomic_schema: "snowplow"
    snowplow__derived_tstamp_partitioned: true
    snowplow__database: "snowplow-prod-******"
    snowplow__enable_ua: true
    snowplow__start_date: '2023-01-01'
    snowplow__backfill_limit_days: 7
    snowplow__allow_refresh: false
  snowplow_mobile:
    snowplow__atomic_schema: "snowplow"
    snowplow__derived_tstamp_partitioned: true
    snowplow__database: "snowplow-prod-******"
    snowplow__enable_mobile_context: true
    snowplow__enable_application_context: true
    snowplow__enable_screen_context: true
    snowplow__start_date: '2023-01-01'
    snowplow__backfill_limit_days: 7
    snowplow__allow_refresh: false

ah, turns out we have one custom model

Ah my guess is that model is tagged correctly, so it gets picked up when we check if it’s in the manifest, but that it isn’t running from the select because it’s not in the package. You could either adjust your select to be based on the tag, or use our pre-built selectors (https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-operation/model-selection/#yaml-selectors), you’ll just have to copy them out of the package because dbt doesn’t support selectors in packages

Yes, seems about right.

So, our custom model is tagged
tags=["snowplow_web_incremental","snowplow_mobile_incremental"] in the config, but it does not seem to get picked up if we run --select snowplow_web which I thought technically should run because in the selectors we have

selectors:
    - name: snowplow_web
    # Description field added dbt v0.19. Commenting out for compatibility.
    # description: >
    #   Suggested node selection when running the Snowplow Web package.
    #   Runs:
    #     - All Snowplow Web models.
    #     - All custom models in your dbt project, tagged with `snowplow_web_incremental`.
    definition:
      union:
        - method: package
          value: snowplow_web
        - method: tag
          value: snowplow_web_incremental

Ah, :man_facepalming:
--selector snowplow_web
not
--select snowplow_web

Yep that’ll be it. I’m not quite sure that it will work as intended being tagged as both though - that will cause the table to be run in both packages, which may lead to some historic overwrites, but if it works for you then great!