Hello All,
I am failing to complete the job for snowplow_web_sessions. The error appears to be that a column does not exist when trying to do an insert from a tmp table into “atomic_derived”.“snowplow_web_sessions”. However in our debugging we have discovered that this column (along with 4 others, SHOULD NOT exist) in the origin table but are being added.
In order to fix the issue, we have removed the following 5 columns from “atomic_derived”.“snowplow_web_sessions”:
collector_tstamp
derived_tstamp
dvce_created_tstamp
page_view_in_session_index
page_views_in_session
The run immediately after will succeed, but subsequent runs will continue to fail because this columns appear to be regenerated by Snowplow_web dbt package.
My team and I have hit a wall with this issue, and I am looking for more information or assistance on how to fix. Any help is appreciated. Additional information below.
Thank you!
17:54:31 19 of 24 ERROR creating snowplow_incremental model atomic_derived.snowplow_web_sessions [ERROR in 1.27s]
17:54:31 20 of 24 SKIP relation atomic_scratch.snowplow_web_users_sessions_this_run ..... [SKIP]
17:54:31 21 of 24 SKIP relation atomic_scratch.snowplow_web_users_aggs .................. [SKIP]
17:54:31 22 of 24 SKIP relation atomic_scratch.snowplow_web_users_lasts ................. [SKIP]
17:54:31 23 of 24 SKIP relation atomic_scratch.snowplow_web_users_this_run .............. [SKIP]
17:54:31 24 of 24 SKIP relation atomic_derived.snowplow_web_users ....................... [SKIP]
17:54:31
17:54:31 Running 1 on-run-end hook
17:54:31 1 of 1 START hook: snowplow_web.on-run-end.0 ................................... [RUN]
17:54:31 1 of 1 OK hook: snowplow_web.on-run-end.0 ...................................... [DROP TABLE in 0.08s]
17:54:31
17:54:31
17:54:31 Finished running 3 incremental models, 17 table models, 4 snowplow_incremental models, 2 hooks in 17.47s.
17:54:32
17:54:32 Completed with 1 error and 0 warnings:
17:54:32
17:54:32 Database Error in model snowplow_web_sessions (models/sessions/snowplow_web_sessions.sql)
17:54:32 column "collector_tstamp" does not exist
17:54:32 LINE 29: ...system_name_version", "operating_system_version", "collector...
17:54:32 ^
17:54:32 HINT: There is a column named "collector_tstamp" in table "snowplow_web_sessions", but it cannot be referenced from this part of the query.
17:54:32 compiled SQL at target/run/snowplow_web/models/sessions/snowplow_web_sessions.sql
Error comes from …/dbt_packages/snowplow_web/models/sessions/snowplow_web_sessions.sql
insert into "enriched"."atomic_derived"."snowplow_web_sessions" ("app_id", "domain_sessionid", "domain_sessionidx", "start_tstamp", "end_tstamp", "model_tstamp", "user_id", "domain_userid", "stitched_user_id", "network_userid", "page_views", "engaged_time_in_s", "absolute_time_in_s", "first_page_title", "first_page_url", "first_page_urlscheme", "first_page_urlhost", "first_page_urlpath", "first_page_urlquery", "first_page_urlfragment", "last_page_title", "last_page_url", "last_page_urlscheme", "last_page_urlhost", "last_page_urlpath", "last_page_urlquery", "last_page_urlfragment", "referrer", "refr_urlscheme", "refr_urlhost", "refr_urlpath", "refr_urlquery", "refr_urlfragment", "refr_medium", "refr_source", "refr_term", "mkt_medium", "mkt_source", "mkt_term", "mkt_content", "mkt_campaign", "mkt_clickid", "mkt_network", "geo_country", "geo_region", "geo_region_name", "geo_city", "geo_zipcode", "geo_latitude", "geo_longitude", "geo_timezone", "user_ipaddress", "useragent", "br_renderengine", "br_lang", "os_timezone", "category", "primary_impact", "reason", "spider_or_robot", "useragent_family", "useragent_major", "useragent_minor", "useragent_patch", "useragent_version", "os_family", "os_major", "os_minor", "os_patch", "os_patch_minor", "os_version", "device_family", "device_class", "agent_class", "agent_name", "agent_name_version", "agent_name_version_major", "agent_version", "agent_version_major", "device_brand", "device_name", "device_version", "layout_engine_class", "layout_engine_name", "layout_engine_name_version", "layout_engine_name_version_major", "layout_engine_version", "layout_engine_version_major", "operating_system_class", "operating_system_name", "operating_system_name_version", "operating_system_version", "collector_tstamp", "derived_tstamp", "dvce_created_tstamp", "page_view_in_session_index", "page_views_in_session")
(
select "app_id", "domain_sessionid", "domain_sessionidx", "start_tstamp", "end_tstamp", "model_tstamp", "user_id", "domain_userid", "stitched_user_id", "network_userid", "page_views", "engaged_time_in_s", "absolute_time_in_s", "first_page_title", "first_page_url", "first_page_urlscheme", "first_page_urlhost", "first_page_urlpath", "first_page_urlquery", "first_page_urlfragment", "last_page_title", "last_page_url", "last_page_urlscheme", "last_page_urlhost", "last_page_urlpath", "last_page_urlquery", "last_page_urlfragment", "referrer", "refr_urlscheme", "refr_urlhost", "refr_urlpath", "refr_urlquery", "refr_urlfragment", "refr_medium", "refr_source", "refr_term", "mkt_medium", "mkt_source", "mkt_term", "mkt_content", "mkt_campaign", "mkt_clickid", "mkt_network", "geo_country", "geo_region", "geo_region_name", "geo_city", "geo_zipcode", "geo_latitude", "geo_longitude", "geo_timezone", "user_ipaddress", "useragent", "br_renderengine", "br_lang", "os_timezone", "category", "primary_impact", "reason", "spider_or_robot", "useragent_family", "useragent_major", "useragent_minor", "useragent_patch", "useragent_version", "os_family", "os_major", "os_minor", "os_patch", "os_patch_minor", "os_version", "device_family", "device_class", "agent_class", "agent_name", "agent_name_version", "agent_name_version_major", "agent_version", "agent_version_major", "device_brand", "device_name", "device_version", "layout_engine_class", "layout_engine_name", "layout_engine_name_version", "layout_engine_name_version_major", "layout_engine_version", "layout_engine_version_major", "operating_system_class", "operating_system_name", "operating_system_name_version", "operating_system_version", "collector_tstamp", "derived_tstamp", "dvce_created_tstamp", "page_view_in_session_index", "page_views_in_session"
from "snowplow_web_sessions__dbt_tmp105430605981"
);
The tmp table seems to be generated by the follow:
snowplow_web_sessions_aggs
snowplow_web_page_views_this_run
snowplow_web_sessions_lasts
From the following joins:
from "enriched"."atomic_scratch"."snowplow_web_sessions_aggs" as b
inner join "enriched"."atomic_scratch"."snowplow_web_page_views_this_run" as a
on a.domain_sessionid = b.domain_sessionid
and a.start_tstamp = b.start_tstamp
and a.page_view_in_session_index = 1
inner join "enriched"."atomic_scratch"."snowplow_web_sessions_lasts" c
on b.domain_sessionid = c.domain_sessionid
We are stumped as to why the dbt snowplow package seems to deliberately and these columns back but then complain when they are there or should not be there.
Thank you again!