I’m new to snowplow and running into problems when trying to use the dbt-snowplow-web package.
I have the latest versions of snowplow and dbt-snowplow-web installed. I use the javascript tracker (with the session context enabled) to collect data. I use the user_id field to set a unique id for my users.
I followed this guide and set all variables accordingly.
When running dbt debug
it says everything is fine.
When running dbt run --selector snowplow_web
i get:
$ dbt run --selector snowplow_web
20:13:43 Running with dbt=1.6.6
20:13:43 Registered adapter: postgres=1.6.6
20:13:43 Found 18 models, 103 tests, 3 seeds, 2 operations, 8 sources, 0 exposures, 0 metrics, 645 macros, 0 groups, 0 semantic models
20:13:43
20:13:46
20:13:46 Running 1 on-run-start hook
20:13:46 1 of 1 START hook: snowplow_web.on-run-start.0 ................................. [RUN]
20:13:46 1 of 1 OK hook: snowplow_web.on-run-start.0 .................................... [OK in 0.00s]
20:13:46
20:13:46 Concurrency: 1 threads (target='dev')
20:13:46
20:13:46 1 of 18 START sql incremental model mo_schema_snowplow_manifest.snowplow_web_base_quarantined_sessions [RUN]
20:13:46 1 of 18 OK created sql incremental model mo_schema_snowplow_manifest.snowplow_web_base_quarantined_sessions [INSERT 0 0 in 0.58s]
20:13:46 2 of 18 START sql incremental model mo_schema_snowplow_manifest.snowplow_web_incremental_manifest [RUN]
20:13:47 2 of 18 OK created sql incremental model mo_schema_snowplow_manifest.snowplow_web_incremental_manifest [INSERT 0 0 in 0.59s]
20:13:47 3 of 18 START sql table model mo_schema_scratch.snowplow_web_base_new_event_limits [RUN]
20:13:47 22:13:47 + Snowplow: No data in manifest. Processing data from start_date
20:13:47 22:13:47 + Snowplow: Processing data between '2023-10-20 00:00:00' and '2023-10-22 20:13:47' (snowplow_web)
20:13:47 3 of 18 OK created sql table model mo_schema_scratch.snowplow_web_base_new_event_limits [SELECT 1 in 0.60s]
20:13:47 4 of 18 START sql incremental model mo_schema_snowplow_manifest.snowplow_web_base_sessions_lifecycle_manifest [RUN]
20:13:48 4 of 18 OK created sql incremental model mo_schema_snowplow_manifest.snowplow_web_base_sessions_lifecycle_manifest [INSERT 0 12290 in 0.99s]
20:13:48 5 of 18 START sql table model mo_schema_scratch.snowplow_web_base_sessions_this_run [RUN]
20:13:49 5 of 18 OK created sql table model mo_schema_scratch.snowplow_web_base_sessions_this_run [SELECT 12290 in 0.59s]
20:13:49 6 of 18 START sql table model mo_schema_scratch.snowplow_web_base_events_this_run [RUN]
20:13:50 Unhandled error while executing
2950
20:13:50 6 of 18 ERROR creating sql table model mo_schema_scratch.snowplow_web_base_events_this_run [ERROR in 0.56s]
20:13:50 7 of 18 SKIP relation mo_schema_scratch.snowplow_web_pv_engaged_time ........... [SKIP]
20:13:50 8 of 18 SKIP relation mo_schema_scratch.snowplow_web_pv_scroll_depth ........... [SKIP]
20:13:50 9 of 18 SKIP relation mo_schema_scratch.snowplow_web_sessions_this_run ......... [SKIP]
20:13:50 10 of 18 SKIP relation mo_schema_derived.snowplow_web_user_mapping ............. [SKIP]
20:13:50 11 of 18 SKIP relation mo_schema_scratch.snowplow_web_page_views_this_run ...... [SKIP]
20:13:50 12 of 18 SKIP relation mo_schema_derived.snowplow_web_sessions ................. [SKIP]
20:13:50 13 of 18 SKIP relation mo_schema_derived.snowplow_web_page_views ............... [SKIP]
20:13:50 14 of 18 SKIP relation mo_schema_scratch.snowplow_web_users_sessions_this_run .. [SKIP]
20:13:50 15 of 18 SKIP relation mo_schema_scratch.snowplow_web_users_aggs ............... [SKIP]
20:13:50 16 of 18 SKIP relation mo_schema_scratch.snowplow_web_users_lasts .............. [SKIP]
20:13:50 17 of 18 SKIP relation mo_schema_scratch.snowplow_web_users_this_run ........... [SKIP]
20:13:50 18 of 18 SKIP relation mo_schema_derived.snowplow_web_users .................... [SKIP]
20:13:50
20:13:50 Running 1 on-run-end hook
20:13:50 1 of 1 START hook: snowplow_web.on-run-end.0 ................................... [RUN]
20:13:50 1 of 1 OK hook: snowplow_web.on-run-end.0 ...................................... [OK in 0.00s]
20:13:50
20:13:50
20:13:50 Finished running 7 incremental models, 11 table models, 2 hooks in 0 hours 0 minutes and 7.11 seconds (7.11s).
20:13:50
20:13:50 Completed with 1 error and 0 warnings:
20:13:50
20:13:50 2950
20:13:50
20:13:50 Done. PASS=5 WARN=0 ERROR=1 SKIP=12 TOTAL=18
That error message is not very descriptive. I have all extra enrichments commented out since I don’t use them:
# snowplow__enable_iab: false
# snowplow__enable_ua: false
# snowplow__enable_yauaa: false
Sorry for my amateur question but I’m completely new to snowplow and dbt. If anyone has a hint on what to do or where to look further, I’d greatly appreciate it.
UPDATE
I had a look into the dbt.log file and there I’m getting more info:
23:08:44.458964 [error] [Thread-1 (]: Unhandled error while executing
2950
23:08:44.466774 [debug] [Thread-1 (]: Traceback (most recent call last):
File "/Users/mo/Desktop/code/snowplow/streamlit/.venv/lib/python3.10/site-packages/dbt/task/base.py", line 372, in safe_run
result = self.compile_and_execute(manifest, ctx)
.
.
.
File "/Users/mo/Desktop/code/snowplow/streamlit/.venv/lib/python3.10/site-packages/dbt/adapters/base/impl.py", line 307, in get_column_schema_from_query
columns = [
File "/Users/mo/Desktop/code/snowplow/streamlit/.venv/lib/python3.10/site-packages/dbt/adapters/base/impl.py", line 309, in <listcomp>
column_name, self.connections.data_type_code_to_name(column_type_code)
File "/Users/mo/Desktop/code/snowplow/streamlit/.venv/lib/python3.10/site-packages/dbt/adapters/postgres/connections.py", line 204, in data_type_code_to_name
return string_types[type_code].name
KeyError: 2950
UPDATE
Ok I managed to dig deeper and found that the string_types function throws an error because it can’t find an item with Key 2950. I put some print statements into
/python3.10/site-packages/dbt/adapters/base/impl.py
to print the type_code and it’s name. It seems that the problem is with a column named: page_view__id
Column Name: user_identifier
Column Type Code: 25
Column Name: event_id_dedupe_index
Column Type Code: 20
Column Name: event_id_dedupe_count
Column Type Code: 20
Column Name: page_view_id
Column Type Code: 1043
Column Name: page_view__tstamp
Column Type Code: 1114
Column Name: page_view__id
Column Type Code: 2950
22:02:31 Unhandled error while executing
2950
Thanks,
Moritz