Dbt-snowplow-web not working - "Unhandled error while executing"

I’m new to snowplow and running into problems when trying to use the dbt-snowplow-web package.

I have the latest versions of snowplow and dbt-snowplow-web installed. I use the javascript tracker (with the session context enabled) to collect data. I use the user_id field to set a unique id for my users.

I followed this guide and set all variables accordingly.

When running dbt debug it says everything is fine.

When running dbt run --selector snowplow_web i get:

$ dbt run --selector snowplow_web
20:13:43  Running with dbt=1.6.6
20:13:43  Registered adapter: postgres=1.6.6
20:13:43  Found 18 models, 103 tests, 3 seeds, 2 operations, 8 sources, 0 exposures, 0 metrics, 645 macros, 0 groups, 0 semantic models
20:13:43  
20:13:46  
20:13:46  Running 1 on-run-start hook
20:13:46  1 of 1 START hook: snowplow_web.on-run-start.0 ................................. [RUN]
20:13:46  1 of 1 OK hook: snowplow_web.on-run-start.0 .................................... [OK in 0.00s]
20:13:46  
20:13:46  Concurrency: 1 threads (target='dev')
20:13:46  
20:13:46  1 of 18 START sql incremental model mo_schema_snowplow_manifest.snowplow_web_base_quarantined_sessions  [RUN]
20:13:46  1 of 18 OK created sql incremental model mo_schema_snowplow_manifest.snowplow_web_base_quarantined_sessions  [INSERT 0 0 in 0.58s]
20:13:46  2 of 18 START sql incremental model mo_schema_snowplow_manifest.snowplow_web_incremental_manifest  [RUN]
20:13:47  2 of 18 OK created sql incremental model mo_schema_snowplow_manifest.snowplow_web_incremental_manifest  [INSERT 0 0 in 0.59s]
20:13:47  3 of 18 START sql table model mo_schema_scratch.snowplow_web_base_new_event_limits  [RUN]
20:13:47  22:13:47 + Snowplow: No data in manifest. Processing data from start_date
20:13:47  22:13:47 + Snowplow: Processing data between '2023-10-20 00:00:00' and '2023-10-22 20:13:47' (snowplow_web)
20:13:47  3 of 18 OK created sql table model mo_schema_scratch.snowplow_web_base_new_event_limits  [SELECT 1 in 0.60s]
20:13:47  4 of 18 START sql incremental model mo_schema_snowplow_manifest.snowplow_web_base_sessions_lifecycle_manifest  [RUN]
20:13:48  4 of 18 OK created sql incremental model mo_schema_snowplow_manifest.snowplow_web_base_sessions_lifecycle_manifest  [INSERT 0 12290 in 0.99s]
20:13:48  5 of 18 START sql table model mo_schema_scratch.snowplow_web_base_sessions_this_run  [RUN]
20:13:49  5 of 18 OK created sql table model mo_schema_scratch.snowplow_web_base_sessions_this_run  [SELECT 12290 in 0.59s]
20:13:49  6 of 18 START sql table model mo_schema_scratch.snowplow_web_base_events_this_run  [RUN]
20:13:50  Unhandled error while executing 
2950
20:13:50  6 of 18 ERROR creating sql table model mo_schema_scratch.snowplow_web_base_events_this_run  [ERROR in 0.56s]
20:13:50  7 of 18 SKIP relation mo_schema_scratch.snowplow_web_pv_engaged_time ........... [SKIP]
20:13:50  8 of 18 SKIP relation mo_schema_scratch.snowplow_web_pv_scroll_depth ........... [SKIP]
20:13:50  9 of 18 SKIP relation mo_schema_scratch.snowplow_web_sessions_this_run ......... [SKIP]
20:13:50  10 of 18 SKIP relation mo_schema_derived.snowplow_web_user_mapping ............. [SKIP]
20:13:50  11 of 18 SKIP relation mo_schema_scratch.snowplow_web_page_views_this_run ...... [SKIP]
20:13:50  12 of 18 SKIP relation mo_schema_derived.snowplow_web_sessions ................. [SKIP]
20:13:50  13 of 18 SKIP relation mo_schema_derived.snowplow_web_page_views ............... [SKIP]
20:13:50  14 of 18 SKIP relation mo_schema_scratch.snowplow_web_users_sessions_this_run .. [SKIP]
20:13:50  15 of 18 SKIP relation mo_schema_scratch.snowplow_web_users_aggs ............... [SKIP]
20:13:50  16 of 18 SKIP relation mo_schema_scratch.snowplow_web_users_lasts .............. [SKIP]
20:13:50  17 of 18 SKIP relation mo_schema_scratch.snowplow_web_users_this_run ........... [SKIP]
20:13:50  18 of 18 SKIP relation mo_schema_derived.snowplow_web_users .................... [SKIP]
20:13:50  
20:13:50  Running 1 on-run-end hook
20:13:50  1 of 1 START hook: snowplow_web.on-run-end.0 ................................... [RUN]
20:13:50  1 of 1 OK hook: snowplow_web.on-run-end.0 ...................................... [OK in 0.00s]
20:13:50  
20:13:50  
20:13:50  Finished running 7 incremental models, 11 table models, 2 hooks in 0 hours 0 minutes and 7.11 seconds (7.11s).
20:13:50  
20:13:50  Completed with 1 error and 0 warnings:
20:13:50  
20:13:50    2950
20:13:50  
20:13:50  Done. PASS=5 WARN=0 ERROR=1 SKIP=12 TOTAL=18

That error message is not very descriptive. I have all extra enrichments commented out since I don’t use them:

    # snowplow__enable_iab: false
    # snowplow__enable_ua: false
    # snowplow__enable_yauaa: false

Sorry for my amateur question but I’m completely new to snowplow and dbt. If anyone has a hint on what to do or where to look further, I’d greatly appreciate it.


UPDATE

I had a look into the dbt.log file and there I’m getting more info:

23:08:44.458964 [error] [Thread-1 (]: Unhandled error while executing 
2950
23:08:44.466774 [debug] [Thread-1 (]: Traceback (most recent call last):
  File "/Users/mo/Desktop/code/snowplow/streamlit/.venv/lib/python3.10/site-packages/dbt/task/base.py", line 372, in safe_run
    result = self.compile_and_execute(manifest, ctx)
.
.
.
  File "/Users/mo/Desktop/code/snowplow/streamlit/.venv/lib/python3.10/site-packages/dbt/adapters/base/impl.py", line 307, in get_column_schema_from_query
    columns = [
  File "/Users/mo/Desktop/code/snowplow/streamlit/.venv/lib/python3.10/site-packages/dbt/adapters/base/impl.py", line 309, in <listcomp>
    column_name, self.connections.data_type_code_to_name(column_type_code)
  File "/Users/mo/Desktop/code/snowplow/streamlit/.venv/lib/python3.10/site-packages/dbt/adapters/postgres/connections.py", line 204, in data_type_code_to_name
    return string_types[type_code].name
KeyError: 2950

UPDATE

Ok I managed to dig deeper and found that the string_types function throws an error because it can’t find an item with Key 2950. I put some print statements into
/python3.10/site-packages/dbt/adapters/base/impl.py to print the type_code and it’s name. It seems that the problem is with a column named: page_view__id

Column Name: user_identifier
Column Type Code: 25
Column Name: event_id_dedupe_index
Column Type Code: 20
Column Name: event_id_dedupe_count
Column Type Code: 20
Column Name: page_view_id
Column Type Code: 1043
Column Name: page_view__tstamp
Column Type Code: 1114
Column Name: page_view__id
Column Type Code: 2950
22:02:31 Unhandled error while executing
2950

Thanks,
Moritz

It looks like this is a bug in dbt where it doesn’t support uuid columns written in data contracts.

Do you have any data_type parameters set to UUID in any of your data modelling yml files?

Hi Mike.

Thanks for the fast response. I checked all files in the repository but there was no data_type definition to be found (I used grep -r -i "data_type:" *)

I just cloned the dbt-snowplow-web repository, changed the necessary variables in dbt_project.yml and thats all the modifications I made.

The whole snowplow pipeline was setup following the Quick start guide for AWS on the snowplow website.

Best,
Moritz

PS: I can see that my event_id column is UUID. Should it be like this?

image

Yeah - event_id should be a uuid (though if it’s cast to varchar it may fix the problem). There’s a PR to fix this in dbt but it hasn’t been merged / released yet. I’ll get the team to check if there might be any other workarounds.

1 Like

Thanks! Actually I managed to solve it now by manually implementing the code of the pull request (had to read up what a PR is first and how it works :sweat_smile: )

Maybe this helps other people too that stumble into the same problem.

1 Like