We rolled out some complex custom contexts today, and our testing process devolved into our engineers firing events, and then us running the EMR ETL and checking the enriched-bad folder on S3 for any problems.
I’m sure there’s a better way - how do you solve this problem?
You might be able to use the Schema validation service for this. While I haven’t used it, it appears to validate not just schema format, but instances of a schema against a definition.
Another option (in addition to what @njenkins has mentioned above) we use is to test and validate all schemas by first pushing them to Iglu on a Snowplow Mini instance (using igluctl) before pushing them into production. This allows us to validate that the data will be enriched/processed correctly and that the result that is populated in Elasticsearch is what we expect.
Thanks for the answers so far. We’re going to at least start using http://www.jsonschemavalidator.net for manual testing.
@dweitzenfeld, please bear in mind that jsonschemavalidator.net doesn’t strictly follows JSON Schema 4 specification. Particularly it will try to validate strings against unexisting date
format (date-time
is correct). You can find more information here (although another online validator is mentioned).
Thanks for the heads up @anton