I am trying to figure out a way to define a Snowplow schema for a generic map or object type to allow setting arbitrary key-value pairs. Can you please share your thoughts on this?
Here is the Avro schema which I am trying to translate into an equivalent Snowplow schema;
I am not sure if it will allow array types as well. But I was wondering if we can define an object type loosely without properties to infer that as a generic type.
As a semi-structured type, it will help if we can preserve the actual type at the raw layer.
I think this would depend on what your use case is.
In theory JSON schema will allow you to do this (as additionalProperties defaults to true anyway for an object) but ultimately depending on what warehouse you are loading into in order for the column to be created the type needs to be known ahead of time. Something like Snowflake will be fine (because properties can be sent through as part of a VARIANT) but that isn’t going to work the same way in Redshift and BigQuery / Databricks where the types are more structured.