Derived Contexts

Hello,

I’m using SnowFlake so the Derived Contexts is just read directly in.

I’ve noted that depending on the UA type that schemas can change their position within Derived Contexts. so position 0, 1, 2, 3, etc are not always within the same position and everything in the array is classed as “data”.

The example below shows, “data”: Position 0 being the UA_Parser and “data”: Position 1 yet another user agent analyser.

{
  "data": [
    {
      "data": {
        "deviceFamily": "Mac",
        "osFamily": "Mac OS X",
        "osMajor": "10",
        "osMinor": "15",
        "osPatch": "7",
        "osPatchMinor": null,
        "osVersion": "Mac OS X 10.15.7",
        "useragentFamily": "Chrome",
        "useragentMajor": "88",
        "useragentMinor": "0",
        "useragentPatch": "4324",
        "useragentVersion": "Chrome 88.0.4324"
      },
      "schema": "iglu:com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-0-0"
    },
    {
      "data": {
        "agentClass": "Browser",
        "agentName": "Chrome",
        "agentNameVersion": "Chrome 88.0.4324.192",
        "agentNameVersionMajor": "Chrome 88",
        "agentVersion": "88.0.4324.192",
        "agentVersionMajor": "88",
        "deviceBrand": "Apple",
        "deviceClass": "Desktop",
        "deviceCpu": "Intel",
        "deviceCpuBits": "32",
        "deviceName": "Apple Macintosh",
        "layoutEngineClass": "Browser",
        "layoutEngineName": "Blink",
        "layoutEngineNameVersion": "Blink 88.0",
        "layoutEngineNameVersionMajor": "Blink 88",
        "layoutEngineVersion": "88.0",
        "layoutEngineVersionMajor": "88",
        "operatingSystemClass": "Desktop",
        "operatingSystemName": "Mac OS X",
        "operatingSystemNameVersion": "Mac OS X 10.15.7",
        "operatingSystemNameVersionMajor": "Mac OS X 10",
        "operatingSystemVersion": "10.15.7",
        "operatingSystemVersionMajor": "10"
      },
      "schema": "iglu:nl.basjes/yauaa_context/jsonschema/1-0-1"
    }
  ],
  "schema": "iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1"
}

Then we have the following example, “data”: Position 0 being the UA_Parser, “data”: Position 1 is now Spider and Robots “data”: Position 2 is now yet another user agent analyser.

{
  "data": [
    {
      "data": {
        "deviceFamily": "Other",
        "osFamily": "Windows",
        "osMajor": "10",
        "osMinor": null,
        "osPatch": null,
        "osPatchMinor": null,
        "osVersion": "Windows 10",
        "useragentFamily": "Chrome",
        "useragentMajor": "87",
        "useragentMinor": "0",
        "useragentPatch": "4280",
        "useragentVersion": "Chrome 87.0.4280"
      },
      "schema": "iglu:com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-0-0"
    },
    {
      "data": {
        "category": "SPIDER_OR_ROBOT",
        "primaryImpact": "UNKNOWN",
        "reason": "FAILED_UA_INCLUDE",
        "spiderOrRobot": true
      },
      "schema": "iglu:com.iab.snowplow/spiders_and_robots/jsonschema/1-0-0"
    },
    {
      "data": {
        "agentClass": "Browser",
        "agentName": "Chrome",
        "agentNameVersion": "Chrome 87.0.4280.88",
        "agentNameVersionMajor": "Chrome 87",
        "agentVersion": "87.0.4280.88",
        "agentVersionMajor": "87",
        "deviceBrand": "Unknown",
        "deviceClass": "Desktop",
        "deviceCpu": "Intel x86_64",
        "deviceCpuBits": "64",
        "deviceName": "Desktop",
        "layoutEngineClass": "Browser",
        "layoutEngineName": "Blink",
        "layoutEngineNameVersion": "Blink 87.0",
        "layoutEngineNameVersionMajor": "Blink 87",
        "layoutEngineVersion": "87.0",
        "layoutEngineVersionMajor": "87",
        "operatingSystemClass": "Desktop",
        "operatingSystemName": "Windows NT",
        "operatingSystemNameVersion": "Windows 10.0",
        "operatingSystemVersion": "10.0"
      },
      "schema": "iglu:nl.basjes/yauaa_context/jsonschema/1-0-0"
    }
  ],
  "schema": "iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1"
}

An Issue comes up when extracting data from Atomic Events for views/models, because everything is called “data” the easiest way is to say which data position, a simple example of divideClass like so:

**select** DERIVED_CONTEXTS: **data** [1]. **data** .deviceClass,DERIVED_CONTEXTS
**FROM** SNOWPLOW_DB. **ATOMIC** .EVENTS_V
**where** DERIVED_CONTEXTS **is** **not** **null**
**and** collector_tstamp > '2021-04-01 00:00:00'
**LIMIT** 100
;

The problem is that data[1] moves, it can be data[2] sometimes and since everything is called “data” its like The Hunt for Red October. Is there a way to set the enricher to maintain the positions of the enrichments so Derived contexts is always in the same order, by run order or something?

Its probably unlikely “data” will change to the schema name so failing that I guess I could use the schema path and somehow use it to find the array I want. Or is there any recommended approach here?

Thanks
Kyle

Hi @kfitzpatrick ,

The real root cause of the problem here looks like you’re piping the enriched stream into the warehouse as-is, rather than using the Snowflake Loader. Or if you are, you’re not likely using the transformer (although I’m not sure it would even work without the transformer if I’m honest, so I’m guessing it’s snowpipe or some other process).

The enriched data is a tsv-format with some fields in Self-Describing JSON format - but it’s not yet in a format that’s opinionated as to what works best for the warehouse. Our database specific loaders do this job - they’ll read schemas from Iglu, create columns in the database, and transform the data into an appropriate format for the column before loading. So in your case, using the supported loader would give you a column per context in the database - containing an array of only the data for that context. (Ie here you’d have a ua parser column and a yauaa column).

So, the best solution is to use the Snowflake loader, and that’s the end state that I’d definitely recommend working towards.

I am conscious that from where you stand now, you might have the more immediate objective of gaining insights in the short term, before you can move to implementing the loader instead of whatever you use now.

The best thing I can think of to get a modeled dataset out of the scenario you describe is to define a Javascript UDF in Snowflake to parse out the relevant data from the array and output a format that’ll work better for you. The analytics SDKs have functions which might serve as inspiration as to how to go about this and there’s a javascript one, so hopefully that makes the task easier.

Obviously manually managing parsing of contexts like this will make things hard to maintain in the long run, and using a UDF probably will have some performance issues in database too (if I remember correctly, Snowflake JS udfs are fairly inefficient overall) - but it gives you a means to an end until such a point as you’re able to solve the underlying problem by implementing the supported loader.

I hope that helps!

4 Likes

Thanks @Colm

This helps very much, your assumptions are very much correct, we forwent the loader. A JS UDF did indeed solve the issue we are currently facing.

It worth mentioning if someone comes across this post in the future that SnowPlow Web Data Models look for loader columns also. So this compounds the issue of not using the loader.

1 Like