Hi there I’ve been banging my head against the wall trying to get the snowplow_web 0.16.0 package to work in dbt. First, I tried implementing in our existing dbt project. With no luck there, I created a new project in my local dev environment with only the snowplow_web package (and associated snowplow_utiils and dbt_utils packages), but still can’t figure out what I’m doing wrong.
Here’s what I’ve got going on:
dbt_project.yml
version: '1.6.3'
vars:
snowplow_web:
snowplow__start_date: '2023-08-01'
snowplow __enable_load_tstamp: false
packages.yml
packages:
- package: snowplow/snowplow_web
version: 0.16.0
Here’s what I get when I run the project (CLI)
17:18:47 Running with dbt=1.6.3
17:18:47 Registered adapter: redshift=1.6.1
17:18:47 Found 18 models, 103 tests, 3 seeds, 2 operations, 8 sources, 0 exposures, 0 metrics, 690 macros, 0 groups, 0 semantic models
17:18:47
17:18:53
17:18:53 Running 1 on-run-start hook
17:18:53 1 of 1 START hook: snowplow_web.on-run-start.0 ................................. [RUN]
17:18:53 1 of 1 OK hook: snowplow_web.on-run-start.0 .................................... [OK in 0.00s]
17:18:53
17:18:53 Concurrency: 1 threads (target='dev')
17:18:53
17:18:53 1 of 18 START sql incremental model dbt_rob_snowplow_manifest.snowplow_web_base_quarantined_sessions [RUN]
17:18:55 1 of 18 OK created sql incremental model dbt_rob_snowplow_manifest.snowplow_web_base_quarantined_sessions [SUCCESS in 2.15s]
17:18:55 2 of 18 START sql incremental model dbt_rob_snowplow_manifest.snowplow_web_incremental_manifest [RUN]
17:18:57 2 of 18 OK created sql incremental model dbt_rob_snowplow_manifest.snowplow_web_incremental_manifest [SUCCESS in 1.89s]
17:18:57 3 of 18 START sql table model dbt_rob_scratch.snowplow_web_base_new_event_limits [RUN]
17:18:57 13:18:57 + Snowplow: No data in manifest. Processing data from start_date
17:18:58 13:18:58 + Snowplow: Processing data between '2023-08-01 00:00:00' and '2023-08-31 00:00:00' (snowplow_web)
17:18:59 3 of 18 OK created sql table model dbt_rob_scratch.snowplow_web_base_new_event_limits [SUCCESS in 2.42s]
17:18:59 4 of 18 START sql incremental model dbt_rob_snowplow_manifest.snowplow_web_base_sessions_lifecycle_manifest [RUN]
17:19:01 4 of 18 ERROR creating sql incremental model dbt_rob_snowplow_manifest.snowplow_web_base_sessions_lifecycle_manifest [ERROR in 1.86s]
17:19:01 5 of 18 SKIP relation dbt_rob_scratch.snowplow_web_base_sessions_this_run ...... [SKIP]
17:19:01 6 of 18 SKIP relation dbt_rob_scratch.snowplow_web_base_events_this_run ........ [SKIP]
17:19:01 7 of 18 SKIP relation dbt_rob_scratch.snowplow_web_pv_engaged_time ............. [SKIP]
17:19:01 8 of 18 SKIP relation dbt_rob_scratch.snowplow_web_pv_scroll_depth ............. [SKIP]
17:19:01 9 of 18 SKIP relation dbt_rob_scratch.snowplow_web_sessions_this_run ........... [SKIP]
17:19:01 10 of 18 SKIP relation dbt_rob_derived.snowplow_web_user_mapping ............... [SKIP]
17:19:01 11 of 18 SKIP relation dbt_rob_scratch.snowplow_web_page_views_this_run ........ [SKIP]
17:19:01 12 of 18 SKIP relation dbt_rob_derived.snowplow_web_sessions ................... [SKIP]
17:19:01 13 of 18 SKIP relation dbt_rob_derived.snowplow_web_page_views ................. [SKIP]
17:19:01 14 of 18 SKIP relation dbt_rob_scratch.snowplow_web_users_sessions_this_run .... [SKIP]
17:19:01 15 of 18 SKIP relation dbt_rob_scratch.snowplow_web_users_aggs ................. [SKIP]
17:19:01 16 of 18 SKIP relation dbt_rob_scratch.snowplow_web_users_lasts ................ [SKIP]
17:19:01 17 of 18 SKIP relation dbt_rob_scratch.snowplow_web_users_this_run ............. [SKIP]
17:19:01 18 of 18 SKIP relation dbt_rob_derived.snowplow_web_users ...................... [SKIP]
17:19:02
17:19:02 Running 1 on-run-end hook
17:19:02 1 of 1 START hook: snowplow_web.on-run-end.0 ................................... [RUN]
17:19:02 1 of 1 OK hook: snowplow_web.on-run-end.0 ...................................... [OK in 0.00s]
17:19:02
17:19:02
17:19:02 Finished running 7 incremental models, 11 table models, 2 hooks in 0 hours 0 minutes and 14.21 seconds (14.21s).
17:19:02
17:19:02 Completed with 1 error and 0 warnings:
17:19:02
17:19:02 Database Error in model snowplow_web_base_sessions_lifecycle_manifest (models/base/manifest/snowplow_web_base_sessions_lifecycle_manifest.sql)
8001
compiled Code at target/run/snowplow_web/models/base/manifest/snowplow_web_base_sessions_lifecycle_manifest.sql
17:19:02
17:19:02 Done. PASS=3 WARN=0 ERROR=1 SKIP=14 TOTAL=18
And here’s a bit more context from dbt.log
[0m13:19:00.530087 [debug] [Thread-1 ]: On model.snowplow_web.snowplow_web_base_sessions_lifecycle_manifest: /* {"app": "dbt", "dbt_version": "1.6.3", "profile_name": "snowplow", "target_name": "dev", "node_id": "model.snowplow_web.snowplow_web_base_sessions_lifecycle_manifest"} */
CREATE TABLE "postmarkbi"."dbt_rob_snowplow_manifest"."snowplow_web_base_sessions_lifecycle_manifest__dbt_tmp" diststyle KEY distkey (
session_identifier) compound sortkey (
start_tstamp)
AS (
WITH new_events_session_ids_init AS (
SELECT
COALESCE(e.domain_sessionid,
NULL) AS session_identifier,
max(COALESCE(e.domain_userid,
NULL)) AS user_identifier,
-- Edge case 1: Arbitary selection to avoid window function like first_value.
min(collector_tstamp) AS start_tstamp,
max(collector_tstamp) AS end_tstamp
FROM
"postmarkbi"."snowcatcloud"."events" e
WHERE
dvce_sent_tstamp <= dateadd (day,
3,
dvce_created_tstamp)
-- don't process data that's too late
AND collector_tstamp >= cast('2023-08-01 00:00:00' AS TIMESTAMP)
AND collector_tstamp <= cast('2023-08-31 00:00:00' AS TIMESTAMP)
AND TRUE
AND TRUE --don't reprocess sessions that have already been processed.
GROUP BY
1),
new_events_session_ids AS (
SELECT
*
FROM
new_events_session_ids_init e
WHERE
session_identifier IS NOT NULL
AND NOT EXISTS (
SELECT
1
FROM
"postmarkbi"."dbt_rob_snowplow_manifest"."snowplow_web_base_quarantined_sessions" AS a
WHERE
a.session_identifier = e.session_identifier) -- don't continue processing v.long sessions
),
session_lifecycle AS (
SELECT
*
FROM
new_events_session_ids
)
SELECT
sl.session_identifier,
sl.user_identifier,
sl.start_tstamp,
least(dateadd (day,
3,
sl.start_tstamp),
sl.end_tstamp) AS end_tstamp -- limit session length to max_session_days
FROM
session_lifecycle sl
);
e[0m13:19:01.460295 [debug] [Thread-1 ]: Redshift adapter: Redshift error: 8001
e[0m13:19:01.461518 [debug] [Thread-1 ]: On model.snowplow_web.snowplow_web_base_sessions_lifecycle_manifest: ROLLBACK
e[0m13:19:01.573067 [debug] [Thread-1 ]: Timing info for model.snowplow_web.snowplow_web_base_sessions_lifecycle_manifest (execute): 13:19:00.413343 => 13:19:01.572301
e[0m13:19:01.574273 [debug] [Thread-1 ]: On model.snowplow_web.snowplow_web_base_sessions_lifecycle_manifest: Close
e[0m13:19:01.579748 [debug] [Thread-1 ]: Database Error in model snowplow_web_base_sessions_lifecycle_manifest (models/base/manifest/snowplow_web_base_sessions_lifecycle_manifest.sql)
8001
compiled Code at target/run/snowplow_web/models/base/manifest/snowplow_web_base_sessions_lifecycle_manifest.sql
e[0m13:19:01.581264 [error] [Thread-1 ]: 4 of 18 ERROR creating sql incremental model dbt_rob_snowplow_manifest.snowplow_web_base_sessions_lifecycle_manifest [e[31mERRORe[0m in 1.86s]
I’ve tried using v0.15 and v0.14 of the package as well, and ran into similar errors there. I haven’t been able to get this to successfully run once, and it’s in its own isolated (fresh) dbt project. What am I missing/doing wrong?