Choosing event types: structured vs unstructured

Hi community,

Has anyone combined structured and unstructured events in product analytics use cases in the same project?

These 2 event types are quite different.

  • Structured
    • :heavy_plus_sign: easy to add new events
    • :heavy_plus_sign: simple structure
    • :heavy_plus_sign: easy to consume by analysts (event category and name already in atomic table)
    • :heavy_minus_sign: se_property, se_label, se_value will mean different things in different events
    • :heavy_minus_sign: can’t provide complex structure (only via custom contexts)
  • Unstructured
    • :heavy_plus_sign: any complex event structure is possible
    • :heavy_minus_sign: harder to add new events by developers (new event --> new schema)
    • :heavy_minus_sign: needs additional data modeling to unwrap unstruct_event into a fat table

I see at least 2 obvious solutions, but wanted to know if anyone was in the same situation?
Option 1. Use structured events with custom contexts.
Option 2. Use both types, and do additional data modeling on top of atomic table to prepare events for downstream consumption (e.g. populate columns final_event_name, final_event_category based on the types of events, as unstructured use schema URLs for event_name, and structured always have event_name=event, etc).


Hi @ostap
We recently released a new blog post, written by @carabaestlein, that covers this topic in some detail:

From my point of view, I would suggest to always use unstructured (i.e. self-describing) json schema’d events. They have all the benefits and very few of the negatives, some extra work on the tracking side of it for the developers is well worth it; with benefits to better understanding of your data (increased data meaning) and consistentency in your data (increased data quality).


@ostap thanks for great question. Personally we do both Option 1+2, but as Paul mentioned I think going only for unstruct is mostly enough as se_* fields would be useful only limited contexts and I think much more work in the end at consumption due to ambigiouty of actual values.

Also as you mentioned doing post-processing atomic table afterwards can be useful as that gives you the flexibility to normalize the raw data and do additional prefiltering etc. based on data consumer needs - though it of course comes with added cost of maintaining those transformations and managing them to run properly.