Dynamically populated contexts?

brajjany · April 7, 2022, 7:02am

Hi,

We are currently looking into how we can create contexts with dynamic fields. Currently we have many static contexts, e.g. customer, user, product, other, but we see the need to have dynamic ones where we don’t assign static fields.

Since we are working with json schemas my initial assumption would be to create a jsonschema that automatically validates against dynamic fields, i.e. like this JSON schema for dynamic properties - Stack Overflow

Any thoughts or guidance in this? I noticed there is support for this in other specific events like page_view and link_click / form activity but my impression is that nothing like this exists on a more general level.

mike · April 7, 2022, 7:35am

Do you have an example of what you have in mind?

You might be able to do this with additionalProperties: true or a field that stores dynamic values as an array of key-value objects but it tends to depend on the application as well as how you want to store / modify this data.

brajjany · April 7, 2022, 7:56am

Hey, thanks for your answer!

I found this doc on additionalProperties

And since we are on a Google setup, I understand data will not be loaded into BigQuery.

Our problem now is that several apps in our case need different sets of contexts because their domain is so different, which makes the static contexts a bit insufficient. At the same time we don’t want our data model to be bloated with so many app-specific contexts. That’s why we are looking into dynamically populating contexts, i.e. where each app are able to provide their own data properties whereas we could incur limitations like the number of properties to persist the data to some extent.

mike · April 7, 2022, 8:21am

I don’t know how true this necessarily is anymore for BigQuery stream loader (if that’s what you are using). For a nested property (not top level) I’m 90% sure (haven’t tested recently) that you can use additionalProperties: true and it will stringify the object rather than archiving it. It’s stringified so not the best to work with but there are some workarounds to treat this as JSON if you need to.

Depending on how common / shared these are I’d either consider additionalProperties or if it makes sense just to model as key value pairs e.g.,

{
  "type": "array",
  "items": {
    "type": "object",
  	"properties": {
      "key": {
        "type": "string"
      },
      "value": {
        "type": "string"
      }
    }
  }
}

brajjany · April 7, 2022, 10:48am

I don’t know how true this necessarily is anymore for BigQuery stream loader (if that’s what you are using). For a nested property (not top level) I’m 90% sure (haven’t tested recently) that you can use additionalProperties: true and it will stringify the object rather than archiving it. It’s stringified so not the best to work with but there are some workarounds to treat this as JSON if you need to.

We are currently not on streamloader unfortunately but will be in the future. This is a good note!

Depending on how common / shared these are I’d either consider additionalProperties or if it makes sense just to model as key value pairs e.g.,
{
  "type": "array",
  "items": {
    "type": "object",
  	"properties": {
      "key": {
        "type": "string"
      },
      "value": {
        "type": "string"
      }
    }
  }
}

This is a good suggestion! So for now since we are not on the streamloader we might create a key value pairs model. How will the array be represented in the bq data model? Will the whole array be stringified or will it be set into bq’s own array data types as we see for form_submit events?

mike · April 7, 2022, 11:14am

It’ll be an array of structs so you won’t have to deal with anything stringified - just UNNEST at query time and you can filter for the key.

brajjany · April 7, 2022, 12:15pm

Ok thanks! So that would mean we would require to structure each key value pair as a tuple in the array?

How would this be validated and stored towards bigquery? When unnesting a context I want to avoid a situation with duplicates as much as possible.

"my_dict": {
			"type" : "object",
			"properties": {
				"key" : {
					"type" : "string"
				},
				"value" : {
					"type" : "string"
				}
			}

This would validate against

data = {
  "my_dict" : {
                     "key1" : "value1",
     	             "key2" : "value2",
                     "key3" : "value3"
   			  }
}

mike · April 8, 2022, 12:36am

Not too sure what you mean by duplicates here?

Your data structure is close but as that is an object your “key” you won’t be able to put an array in there (without collisions) so it’ll look more like:

{
  "type": "array",
  "items": {
    "type": "object",
  	"properties": {
      "key": {
        "type": "string"
      },
      "value": {
        "type": "string"
      }
    },
    "additionalProperties": false
  }
}

and your data in BigQuery will look something like

[
  {"key": "app_id_example", "value": "123"},
  {"key": "app_id_next", "value":"456"}
]

Topic		Replies	Views
Add extra fields to default iglu jsonschema Iglu	3	906	March 10, 2023
Setting additionalProperties to true in Iglu JSON Schemas Iglu	1	4017	November 1, 2017
Define required contexts for each event Iglu	3	1433	February 14, 2020
Schema for global JS object context? For data modelers & consumers	4	1237	December 13, 2018
Schema Breaking and Non-Breaking Changes For data modelers & consumers	0	65	July 25, 2024

Dynamically populated contexts?

Related topics