Best way to validate custom context data?

We rolled out some complex custom contexts today, and our testing process devolved into our engineers firing events, and then us running the EMR ETL and checking the enriched-bad folder on S3 for any problems.
I’m sure there’s a better way - how do you solve this problem?

You might be able to use the Schema validation service for this. While I haven’t used it, it appears to validate not just schema format, but instances of a schema against a definition.


Another option (in addition to what @njenkins has mentioned above) we use is to test and validate all schemas by first pushing them to Iglu on a Snowplow Mini instance (using igluctl) before pushing them into production. This allows us to validate that the data will be enriched/processed correctly and that the result that is populated in Elasticsearch is what we expect.

Thanks for the answers so far. We’re going to at least start using for manual testing.

@dweitzenfeld, please bear in mind that doesn’t strictly follows JSON Schema 4 specification. Particularly it will try to validate strings against unexisting date format (date-time is correct). You can find more information here (although another online validator is mentioned).

Thanks for the heads up @anton