Snowplow Micro Event Funneling during testing

sandraqu · June 29, 2021, 8:53pm

We need to find a way to funnel all events through one script/function so that we can monitor all events currently existing and any new ones that may be created.

We are considering many ideas, and I would like to ask some questions in that regard.

When Snowplow Micro gets it’s first event, lets call it, firstEvent, does it have knowledge of all the other existing events even though they have not fired?

We need the quickest visibility to all events.

mike · June 29, 2021, 10:46pm

It has a knowledge of all the possible events / entities according to what has been defined in your Iglu repositories according to the Iglu resolver file. This will mean if an event ‘exists’ (in the sense that it has a matching Iglu URI and can be resolved) it will attempt to validate otherwise it will yield a bad row.

sandraqu · June 29, 2021, 11:44pm

That sounds like a yes. Let’s have another event, secondEvent. When firstEvent fires, I can look at the iglu resolver file, and see firstEvent and secondEvent, even though secondEvent has not yet fired.

How do I use the Resolver? In the documentation there is a config file, and not much else.

mike · June 30, 2021, 1:02am

The resolver file allows you to point to different Iglu repositories. A repository in essence is really just a store for schemas as well as an API that allows you to create and retrieve these schemas.

So the timeline in your case might be:

Send event 1 with: com.example/firstevent/jsonschema/1-0-0
The enricher (specifically the Iglu client) will read the resolver file and aim to find a schema (schema resolution) that matches this definition. It will, depending on the configuration, look for this schema in each repository and if it finds it it will attempt to validate your self-describing event against the retrieved schema.

Send event 2 with: com.example/secondevent/jsonschema/1-0-0
The enricher will repeat the process. The Iglu repository isn’t aware of the data you are sending it, or aware of the fact that you sent an earlier event - each event is processed independently.

Hope that helps!

sandraqu · June 30, 2021, 3:14pm

Thus no, at the time firstEvent fires, Micro does not have knowledge of secondEvent.

Is there any part of Snowplow that has knowledge of all events?

mike · June 30, 2021, 10:26pm

So the way to do this would typically be an application that reads off the enriched stream (PubSub in GCP, Kinesis in AWS). Enriched will have all the validated events and the bad stream will have events that have failed validation.

Events that haven’t been defined ahead of time (e.g., have a resolvable schema in an Iglu repository) will end up in bad, and anything that can resolve and validates should end up in enriched.

All successful events will end up in whatever your datastore may be (Redshift, Snowflake, BigQuery) and will be temporarily stored in stream as well.

Topic		Replies	Views
Events sent to snowplow-mini not appearing in any index Snowplow Mini	3	1829	February 16, 2017
Using snowplow-micro with custom events Collectors	2	892	February 20, 2023
NodeJs Unstructured Events - http://iglucentral.com/ Iglu	12	1107	October 26, 2022
Snowplow example events Collectors	3	322	February 22, 2024
SnowPlow Micro Schema validation (Iglu) Failed Enrichment	14	2990	October 22, 2020

Snowplow Micro Event Funneling during testing

Related topics