Dynamically populated contexts?

Hi,

We are currently looking into how we can create contexts with dynamic fields. Currently we have many static contexts, e.g. customer, user, product, other, but we see the need to have dynamic ones where we don’t assign static fields.

Since we are working with json schemas my initial assumption would be to create a jsonschema that automatically validates against dynamic fields, i.e. like this JSON schema for dynamic properties - Stack Overflow

Any thoughts or guidance in this? I noticed there is support for this in other specific events like page_view and link_click / form activity but my impression is that nothing like this exists on a more general level.

Do you have an example of what you have in mind?

You might be able to do this with additionalProperties: true or a field that stores dynamic values as an array of key-value objects but it tends to depend on the application as well as how you want to store / modify this data.

Hey, thanks for your answer!

I found this doc on additionalProperties
image

And since we are on a Google setup, I understand data will not be loaded into BigQuery.

Our problem now is that several apps in our case need different sets of contexts because their domain is so different, which makes the static contexts a bit insufficient. At the same time we don’t want our data model to be bloated with so many app-specific contexts. That’s why we are looking into dynamically populating contexts, i.e. where each app are able to provide their own data properties whereas we could incur limitations like the number of properties to persist the data to some extent.

I don’t know how true this necessarily is anymore for BigQuery stream loader (if that’s what you are using). For a nested property (not top level) I’m 90% sure (haven’t tested recently) that you can use additionalProperties: true and it will stringify the object rather than archiving it. It’s stringified so not the best to work with but there are some workarounds to treat this as JSON if you need to.

Depending on how common / shared these are I’d either consider additionalProperties or if it makes sense just to model as key value pairs e.g.,

{
  "type": "array",
  "items": {
    "type": "object",
  	"properties": {
      "key": {
        "type": "string"
      },
      "value": {
        "type": "string"
      }
    }
  }
}
1 Like

I don’t know how true this necessarily is anymore for BigQuery stream loader (if that’s what you are using). For a nested property (not top level) I’m 90% sure (haven’t tested recently) that you can use additionalProperties: true and it will stringify the object rather than archiving it. It’s stringified so not the best to work with but there are some workarounds to treat this as JSON if you need to.

We are currently not on streamloader unfortunately but will be in the future. This is a good note!

Depending on how common / shared these are I’d either consider additionalProperties or if it makes sense just to model as key value pairs e.g.,

{
  "type": "array",
  "items": {
    "type": "object",
  	"properties": {
      "key": {
        "type": "string"
      },
      "value": {
        "type": "string"
      }
    }
  }
}

This is a good suggestion! So for now since we are not on the streamloader we might create a key value pairs model. How will the array be represented in the bq data model? Will the whole array be stringified or will it be set into bq’s own array data types as we see for form_submit events?

It’ll be an array of structs so you won’t have to deal with anything stringified - just UNNEST at query time and you can filter for the key.

Ok thanks! So that would mean we would require to structure each key value pair as a tuple in the array?

How would this be validated and stored towards bigquery? When unnesting a context I want to avoid a situation with duplicates as much as possible.

"my_dict": {
			"type" : "object",
			"properties": {
				"key" : {
					"type" : "string"
				},
				"value" : {
					"type" : "string"
				}
			}

This would validate against

data = {
  "my_dict" : {
                     "key1" : "value1",
     	             "key2" : "value2",
                     "key3" : "value3"
   			  }
}

Not too sure what you mean by duplicates here?

Your data structure is close but as that is an object your “key” you won’t be able to put an array in there (without collisions) so it’ll look more like:

{
  "type": "array",
  "items": {
    "type": "object",
  	"properties": {
      "key": {
        "type": "string"
      },
      "value": {
        "type": "string"
      }
    },
    "additionalProperties": false
  }
}

and your data in BigQuery will look something like

[
  {"key": "app_id_example", "value": "123"},
  {"key": "app_id_next", "value":"456"}
]
1 Like