Hi @eileen_dover,
First, I’ll try to give you some context that will hopefully make it easier to understand my answers to your questions.
We can think of Snowplow events as being one of two types. We usually call them ‘atomic’ and ‘custom self-describing’ events (though you might see the latter also being called ‘custom unstructured events’). The main difference between these two types is that the self-describing events are always accompanied by information about the schema against which they must validate. Whereas the ‘atomic’ events do not come with a JSON schema to which they must conform.
If you think of a Snowplow event as a single line, it will have properties like app_name
, event_name
etc and they are associated with specific values. If it’s an ‘atomic’ event, it might only have these simple key-value pairs. But, if it’s a self-describing event, it will have an additional field, called unstruct_event
. The value of this field will be a JSON with two important properties: schema
which points to the JSON schema that the event must validate against; and data
, which contains the actual event payload.
In addition, both types of events – atomic and self-describing – might have some custom contexts added to them by the tracker. Just like the unstruct_event
, the information about these contexts will be in a separate field of the event for ‘custom context’, and the value will be a self-describing JSON (specifying both the schema for the context, and the data that must validate against this schema).
It’s important to understand that a self-describing event always has all the ‘atomic’ properties as well. It can be represented very approximately as something like:
Map(
"app_name" -> "my_app",
"event_name" -> "my_event_name",
"unstruct_event" ->
{
"schema": "iglu:com.acme/my_event_name/jsonschema/1-0-0",
"data": {"key1": "value1"}
}
)
To answer your first question, you would use pojo
when your input is part of the ‘atomic’ fields, like app_id
. And you will use json
when it is part of one of the JSON fields, like unstruct_event
or contexts
.
On your second question, the field
must specify where to look for the JSON. It must be one of unstruct_event
, contexts
(sent by the tracker) or derived_contexts
(added by another enrichment).
The schemaCriterion
must specify the schema for the event or context where the input is coming from.
For example, you might have the following inputs (matching the example above):
{
"key": "app_name",
"pojo": {
"field": "app_name"
}
},
{
"key": "key1",
"json": {
"field": "unstruct_event",
"schemaCriterion": "iglu:com.acme/my_event_name/jsonschema/1-0-0",
"jsonPath": "$.key1"
}
}
Finally, all outputs will be attached to the event, and they will go in the derived_contexts
field. Like unstruct_event
and contexts
, this field contains self-describing JSONs, so you will need to write one or more schema(s) for the output (depending on whether you want to attached all the data returned from the API as a single context or as multiple contexts).
You might also find this tutorial helpful, especially steps 2 and 3.