I was recently asked this question by a client:
I have a question about
additionalProperties
attribute in json schema. I’m wondering whether it can give us more flexibility as we could add more tracked properties ad-hoc on the client side. Do you have any example how it could be used? Are the additional properties ignored when the StorageLoader runs?
It’s a great question, so I’ve posted it here in case others are wondering. There’s a trade-off:
More ‘relaxed’ schemas i.e. set additionalProperties
to true
- Pro: this means developers can add new properties to the an event / context without it breaking validation. (So the data will still be successfully processed and e.g. loaded into Redshift
- Con: the extra properties added wont be accessible to downstream processes (e.g. loaded into Redshift).
The alternative is to set additionalProperties
to false i.e. go for a ‘stricter’ schema
- Con: events with extra properties will now fail validation
- Pro: you should see that something’s gone wrong (increase in the # of bad rows e.g. by checking Kibana) and can
- update your schema to accommodate the new data
- reprocess the data that has failed validation. (In practice this is fiddly / time consuming at the moment, but we’re working on a toolset to make it easier.)