Hello People,
I am trying to set up the following snowplow web dbt package with Snowflake.
https://hub.getdbt.com/snowplow/snowplow_web/latest/
And from what I have understood so far, this package works out of the box only if data in snowflake was loaded using the loader.
I am loading the data using snowpipe and all I have is “events” table as a source and following sources are basically not directly available.
com_snowplowanalytics_snowplow_web_page_1
com_snowplowanalytics_snowplow_ua_parser_context_1
nl_basjes_yauaa_context_1
However,
When we get enriched events and I see that we have columns like “contexts” and “derived_contexts” that actually has the context information needed to build these sources.
Example :
I have the following information in contexts column that I could parse to build com_snowplowanalytics_snowplow_web_page_1
{
“data”: {
“id”: “aed6ee27-9a84-40d0-b60a-8ce83711e500”
},
“schema”: “iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0”
Similarly, I have the following in derived_context that I could parse to build nl_basjes_yauaa_context_1
{
“data”: {
“agentClass”: “Browser Webview”,
“agentName”: “Chrome Webview”,
“agentNameVersion”: “Chrome Webview 99.0.4844.73”,
“agentNameVersionMajor”: “Chrome Webview 99”,
“agentVersion”: “99.0.4844.73”,
“agentVersionMajor”: “99”,
“deviceBrand”: “Oppo”,
“deviceClass”: “Phone”,
“deviceName”: “Oppo CPH1819”,
“layoutEngineClass”: “Browser”,
“layoutEngineName”: “Blink”,
“layoutEngineNameVersion”: “Blink 99.0”,
“layoutEngineNameVersionMajor”: “Blink 99”,
“layoutEngineVersion”: “99.0”,
“layoutEngineVersionMajor”: “99”,
“operatingSystemClass”: “Mobile”,
“operatingSystemName”: “Android”,
“operatingSystemNameVersion”: “Android 10”,
“operatingSystemNameVersionMajor”: “Android 10”,
“operatingSystemVersion”: “10”,
“operatingSystemVersionBuild”: “QP1A.190711.020”,
“operatingSystemVersionMajor”: “10”,
“webviewAppName”: “Facebook App for Android”,
“webviewAppNameVersionMajor”: “Facebook App for Android 358”,
“webviewAppVersion”: “358.0.0.34.117”,
“webviewAppVersionMajor”: “358”
},
“schema”: “iglu:nl.basjes/yauaa_context/jsonschema/1-0-2”
Similarly,
I have the following information to build com_snowplowanalytics_snowplow_ua_parser_context_1 as well
{
“data”: [
{
“data”: {
“deviceFamily”: “Generic Smartphone”,
“osFamily”: “Android”,
“osMajor”: “10”,
“osMinor”: null,
“osPatch”: null,
“osPatchMinor”: null,
“osVersion”: “Android 10”,
“useragentFamily”: “Facebook”,
“useragentMajor”: “358”,
“useragentMinor”: “0”,
“useragentPatch”: “0”,
“useragentVersion”: “Facebook 358.0.0”
},
“schema”: “iglu:com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-0-0”
}
2 questions:
- Does this make sense to take this approach of building these sources myself from these contexts columns and make the package work ?
- Only thing I don’t understand is all of these sources have the following columns that I have no idea where they are coming from ?
root_id
root_tstamp
ref_root
ref_tree
ref_parent
root_id seems to be the actual event_id of the corresponding event of that row. what are those other columns are and where can I fetch those from ?