First of all, I want to mention that I like Snowbridge!
We just rolled out our first realtime recommender realized via Snowbridge → Kafka. What would interest me is, why the major schema version is appended as a suffix to all entities / events for example:
contexts_com_mycompany_product_1
In my opinion, this generates much more pain than added value, because breaking schema changes must be considered in multiple components of data pipelines:
1. Snowbridge filtering / transform functions:
- the out of the box functions can only reference one major version, no wildcard functions are available
- it is possible to build a solution via JS transformations (example below), but it seems cumbersome and costs unnecessary performance
function main(input) {
var spData = input.Data;
var entities = new Set();
for (var majorVersion = 1; majorVersion <= 10; majorVersion++) { // check major version 1 - 10
entities.add(`contexts_com_mycompany_product_${majorVersion}`);
}
// Check if of product entity version 1-10 exist in spData
var hasProductEntity = Array.from(entities).some(entity => entity in spData);
if (
spData["event"] == "page_view" &&
("user_id" in spData) &&
spData["page_urlpath"].includes("/product/") &&
hasProductEntity
) {
return {
FilterOut: false,
Data: spData
};
} else {
return {
FilterOut: true
};
}
}
2. Downstream consumes Kafka, PubSub, GTM servers-side etc.:
All consumers needs to find a way to handle breaking changes, which is complex and or costs performance.
Conclusion
- Including the major version as a suffix would make sense, if a breaking change should stop event forwarding or downstream consumption, but that is extremely unlikely in reality!!
- I think quite a lot of breaking changes happen due to additional required fields. If downstream consumer only need e.g. property a,b,c but not the additional required fields, they don’t care about the breaking change.
Suggestion:
Include the full schema version as a property, similar to some warehouse loaders, would make working with Snowbridge way easier. Example for Version 1-0-5:
"contexts_com_mycompany_product": [
{
"_schema": "1-0-5",
"id": "1234567",
"productTypeId": 614,
...
}
],
I am looking forward to your thoughts, suggestions and solutions.
Kind regards.
David