Blank Self-Describing or Structured Events

Hi Guys,

We are embarking on our Snowplow journey and as we begin to derive our schema structure, we were hoping to utilise some of the vast experience on this channel - in terms of what has or hasn’t worked!

From our perspective, the optimal solution would be for us to build schemas comprised solely of contexts [with no data attached to the events themselves] as this would be the cleanest solution.

An example of this can be found in Snowplow documentation here

However, we recognise that if we opt for entities within self-describing events, then the event schema would be essentially blank.

The Snowplow documentation expresses “We recommend using Self Describing events whenever possible as they give more control and semantic meaning to your data tracking.” If the majority of our events use only entities with no data attached, would Self Describing events remain the best event type in our case?

Example:

A basket add event may comprise solely of basket and product contexts - therefore the basket_add schema “iglu:com.example/basket_add/jsonschema/1-0-0” would be blank.

window.snowplow('trackSelfDescribingEvent', {
  "event": {
     "schema": "iglu:com.example/basket_add/jsonschema/1-0-0",
     "data": {}
  },
  "context": [{
     "schema": "iglu:com.example/basket/jsonschema/1-0-0",
     "data": {
        "action": "add",
        "id": 12345,
        "total": 100
     },
     "schema": "iglu:com.example/product/jsonschema/1-0-0",
     "data": {
        "name": "example_name",
        "quantity": 1,
        "price": 100,
        "category": "example_category",
        "sku": "example_sku"
     }
  }]
});

The main disadvantage we see with this is having to maintain many blank schemas for different events. Also for going through a deployment to add any new ones.

However, a structured event always depends on end-users having to pick what category and action values are required and can be a bit arbitrary.

Hi @helena_rose94 ,

I think you are on the right track. Using self-describing events with context entities is definitely our recommendation. Furthermore, if you look at our more recent schemas, they have gone in a similar direction as you proposed – having a minimalistic schema for the self-describing event stating the type of action (maybe with just a single or no properties) and adding entities to give context and describe the action. We find this to be quite a flexible and expressive way to construct the events.

To give an example similar to your example, you can take a look at the schemas used in our e-commerce accelerator. There is only a single self-describing event schema – the snowplow_ecommerce_action. This schema contains just two properties giving the type of action. The action is then described further using context entities with schemas like:

3 Likes

Further to what Matus says, and whats shown in the snowplow_ecommerce_action schema, that that you don’t need a new event for every single distinct action. You can design a schema which covers a broad range of actions which are related, in this case the different types of “actions” a user takes in an ecommerce journey.

Both going super specific with events, and more general events, are both valid routes to take. They can both be modelled in the warehouse fine. Its usually an argument of different levels of maintenace. (Personally I’ve come to prefer the method I describe above, but I know others who prefer more specific event schemas).

2 Likes