I was curious if there was a way to have the validation step within the enricher apply some global rules regardless of schema. For instance, we want to make sure that fields such as application_id and environment are always set. I know we can add this kind of check on custom schemas, but was wondering if there’s an easier way to define “global” rules that would apply to every event in the pipeline regardless of schema.
This is a really great question and has had small amount of discussion before.
The short answer is no - it’s not really supported out of the box although the core payload (that’ll you’ll see as part of POST requests to the collector) does use the payload_data schema. If you really wanted to do some level of global enforcement you could to that here by specifying your own schema that effectively overrides the Snowplow one but it isn’t really recommended.
As a shameless plug we’ve just released Test Suites in the Chrome Extension last week which allows you to specify rules (e.g., app_id conforming to a regex, a value within a context not being null) which might get you some of the way. Unfortunately this is client side only at the moment so is useful for the QA process but won’t fail events in enrichment.