I’m using the dbt snowplow_web package. Unfortunately it gets stuck at model 9 and doesn’t continue (I let it run the whole night but it never finishes - CPU on the RDS machine is at 50% so I guess it hangs somewhere?!)
Using it on AWS / RDS / Postgres, with snowplow__enable_load_tstamp: false
Output looks like this:
(.venv) moon:dbt-snowplow-web mo$ dbt run --selector snowplow_web
07:58:22 Running with dbt=1.6.6
07:58:22 Registered adapter: postgres=1.6.6
07:58:22 Found 18 models, 103 tests, 3 seeds, 2 operations, 8 sources, 0 exposures, 0 metrics, 645 macros, 0 groups, 0 semantic models
07:58:22
07:58:25
07:58:25 Running 1 on-run-start hook
07:58:25 1 of 1 START hook: snowplow_web.on-run-start.0 ................................. [RUN]
07:58:25 1 of 1 OK hook: snowplow_web.on-run-start.0 .................................... [OK in 0.00s]
07:58:25
07:58:25 Concurrency: 1 threads (target='dev')
07:58:25
07:58:25 1 of 18 START sql incremental model skdbt_snowplow_manifest.snowplow_web_base_quarantined_sessions [RUN]
07:58:25 1 of 18 OK created sql incremental model skdbt_snowplow_manifest.snowplow_web_base_quarantined_sessions [SELECT 0 in 0.41s]
07:58:25 2 of 18 START sql incremental model skdbt_snowplow_manifest.snowplow_web_incremental_manifest [RUN]
07:58:26 2 of 18 OK created sql incremental model skdbt_snowplow_manifest.snowplow_web_incremental_manifest [SELECT 0 in 0.37s]
07:58:26 3 of 18 START sql table model skdbt_scratch.snowplow_web_base_new_event_limits . [RUN]
07:58:26 08:58:26 + Snowplow: No data in manifest. Processing data from start_date
07:58:26 08:58:26 + Snowplow: Processing data between '2023-11-01 00:00:00' and '2023-11-08 07:58:26' (snowplow_web)
07:58:26 3 of 18 OK created sql table model skdbt_scratch.snowplow_web_base_new_event_limits [SELECT 1 in 0.52s]
07:58:26 4 of 18 START sql incremental model skdbt_snowplow_manifest.snowplow_web_base_sessions_lifecycle_manifest [RUN]
07:58:29 4 of 18 OK created sql incremental model skdbt_snowplow_manifest.snowplow_web_base_sessions_lifecycle_manifest [SELECT 46529 in 2.57s]
07:58:29 5 of 18 START sql table model skdbt_scratch.snowplow_web_base_sessions_this_run [RUN]
07:58:30 5 of 18 OK created sql table model skdbt_scratch.snowplow_web_base_sessions_this_run [SELECT 46529 in 0.73s]
07:58:30 6 of 18 START sql table model skdbt_scratch.snowplow_web_base_events_this_run .. [RUN]
07:58:42 6 of 18 OK created sql table model skdbt_scratch.snowplow_web_base_events_this_run [SELECT 71184 in 12.33s]
07:58:42 7 of 18 START sql table model skdbt_scratch.snowplow_web_pv_engaged_time ....... [RUN]
07:58:43 7 of 18 OK created sql table model skdbt_scratch.snowplow_web_pv_engaged_time .. [SELECT 0 in 1.20s]
07:58:43 8 of 18 START sql table model skdbt_scratch.snowplow_web_pv_scroll_depth ....... [RUN]
07:58:45 8 of 18 OK created sql table model skdbt_scratch.snowplow_web_pv_scroll_depth .. [SELECT 65540 in 1.72s]
07:58:45 9 of 18 START sql table model skdbt_scratch.snowplow_web_sessions_this_run ..... [RUN]
^C08:10:56 CANCEL query model.snowplow_web.snowplow_web_sessions_this_run ................. [CANCEL]
08:10:56 9 of 18 ERROR creating sql table model skdbt_scratch.snowplow_web_sessions_this_run [ERROR in 731.34s]
If I use the generated command from target/run/snowplow_web/models/sessions/scratch/default/snowplow_web_sessions_this_run.sql
and execute it directly in my postgres terminal, it does work within a minute or so!
What could be the problem? And how can I skip step 9 so I can do that step manually and then let the dbt run
command continue?
Also: I tried to do dbt compile --select snowplow_web
to get all the commands it would execute, so I could do it manually in my postgres terminal. Unfortunately here also, I get an error:
(.venv) moon:dbt-snowplow-web mo$ dbt compile --select snowplow_web
09:26:55 Running with dbt=1.6.6
09:26:55 Registered adapter: postgres=1.6.6
09:26:55 Found 18 models, 103 tests, 3 seeds, 2 operations, 8 sources, 0 exposures, 0 metrics, 645 macros, 0 groups, 0 semantic models
09:26:55
09:26:56 Concurrency: 1 threads (target='dev')
09:26:56
09:26:56 Snowplow Warning: "snowplow"."skdbt_snowplow_manifest"."snowplow_web_incremental_manifest" does not exist. This is expected if you are compiling a fresh installation of the dbt-snowplow-* packages.
09:26:56 10:26:56 + Snowplow: No data in manifest. Processing data from start_date
09:26:56 Snowplow Warning: "snowplow"."skdbt_scratch"."snowplow_web_base_new_event_limits" does not exist. This is expected if you are compiling a fresh installation of the dbt-snowplow-* packages.
09:26:57 Snowplow Warning: "snowplow"."skdbt_scratch"."snowplow_web_base_new_event_limits" does not exist. This is expected if you are compiling a fresh installation of the dbt-snowplow-* packages.
09:26:57 Snowplow Warning: "snowplow"."skdbt_scratch"."snowplow_web_base_sessions_this_run" does not exist. This is expected if you are compiling a fresh installation of the dbt-snowplow-* packages.
09:26:58 Encountered an error:
Runtime Error
Database Error in model snowplow_web_base_events_this_run (models/base/scratch/default/snowplow_web_base_events_this_run.sql)
relation "skdbt_scratch.snowplow_web_base_sessions_this_run" does not exist
LINE 288: inner join "snowplow"."skdbt_scratch"."snowplow_...
^
Update: I did a run with the snowplow__start_date:
set to yesterday. So it only had to process one day. This works (takes still rather long though - I dont have too much data in my database). I took the commands from that run (dbt.log) and tried to use it again with my original start_data (2023-10-20) - by executing it step by step in my postgres terminal. This worked. However, when I run dbt run --select snowplow_web
again, it gets stuck again (even though it should only process the newest events incrementally, if I understand correctly). I also put keepalives_idle: 10
in my profiles.yaml as suggested here - but it doesn’t change anything. When I check the activity in my pgAdmin Dashboard I get this: