Customise column names for Event and Entities

mike · November 22, 2023, 7:09am

I think there are definitely some advantages to customisation but in having this standard (which is applied globally across all events + entities) it enables us to build robust data models like the recently released unified model to take advantage of knowing that these conventions are in place. If we allowed for more flexible renaming in the destination target we would then need to add additional complexity to each data model (and in turn each data warehouse) to account for variability in these column names.

Other changes - like storing the version as part of the column name rather than within the column allow us to often more efficiently query warehouses - e.g., in BigQuery this means the user can scan fewer bytes (by selecting only the values they need) and in Snowflake you can do so similarly where you can reduce the amount of computation required by not needing as many columns, as well as not needing to reference a field inside the VARIANT. Many query engines either do not calculate or provide only coarse / block level statistics on properties within these columns and as a result the query planner isn’t able to optimise the scans as much as is otherwise possible.

Topic		Replies	Views
Simplifying wide row column names for event and contexts GCP pipeline	4	575	November 2, 2023
Self-describe Events & Entities Try Snowplow	2	694	October 25, 2023
Why unstructured events' fields prefixed as unstruct_event_*? Iglu	1	1461	December 3, 2016
Self-describing event jsonschema defaults don't populate For engineers	4	762	January 18, 2021
Self-describing events versus the mega JSON-object property for Snowflake? For data modelers & consumers	8	1255	February 10, 2022

Customise column names for Event and Entities

Related topics