We need to find a way to funnel all events through one script/function so that we can monitor all events currently existing and any new ones that may be created.
We are considering many ideas, and I would like to ask some questions in that regard.
When Snowplow Micro gets it’s first event, lets call it, firstEvent, does it have knowledge of all the other existing events even though they have not fired?
It has a knowledge of all the possible events / entities according to what has been defined in your Iglu repositories according to the Iglu resolver file. This will mean if an event ‘exists’ (in the sense that it has a matching Iglu URI and can be resolved) it will attempt to validate otherwise it will yield a bad row.
That sounds like a yes. Let’s have another event, secondEvent. When firstEvent fires, I can look at the iglu resolver file, and see firstEvent and secondEvent, even though secondEvent has not yet fired.
How do I use the Resolver? In the documentation there is a config file, and not much else.
The resolver file allows you to point to different Iglu repositories. A repository in essence is really just a store for schemas as well as an API that allows you to create and retrieve these schemas.
So the timeline in your case might be:
Send event 1 with: com.example/firstevent/jsonschema/1-0-0
The enricher (specifically the Iglu client) will read the resolver file and aim to find a schema (schema resolution) that matches this definition. It will, depending on the configuration, look for this schema in each repository and if it finds it it will attempt to validate your self-describing event against the retrieved schema.
Send event 2 with: com.example/secondevent/jsonschema/1-0-0
The enricher will repeat the process. The Iglu repository isn’t aware of the data you are sending it, or aware of the fact that you sent an earlier event - each event is processed independently.
So the way to do this would typically be an application that reads off the enriched stream (PubSub in GCP, Kinesis in AWS). Enriched will have all the validated events and the bad stream will have events that have failed validation.
Events that haven’t been defined ahead of time (e.g., have a resolvable schema in an Iglu repository) will end up in bad, and anything that can resolve and validates should end up in enriched.
All successful events will end up in whatever your datastore may be (Redshift, Snowflake, BigQuery) and will be temporarily stored in stream as well.