API Request Enrichment with AWS

dilyan · August 8, 2022, 1:24pm

First, I’ll try to give you some context that will hopefully make it easier to understand my answers to your questions.

We can think of Snowplow events as being one of two types. We usually call them ‘atomic’ and ‘custom self-describing’ events (though you might see the latter also being called ‘custom unstructured events’). The main difference between these two types is that the self-describing events are always accompanied by information about the schema against which they must validate. Whereas the ‘atomic’ events do not come with a JSON schema to which they must conform.

If you think of a Snowplow event as a single line, it will have properties like app_name, event_name etc and they are associated with specific values. If it’s an ‘atomic’ event, it might only have these simple key-value pairs. But, if it’s a self-describing event, it will have an additional field, called unstruct_event. The value of this field will be a JSON with two important properties: schema which points to the JSON schema that the event must validate against; and data, which contains the actual event payload.

In addition, both types of events – atomic and self-describing – might have some custom contexts added to them by the tracker. Just like the unstruct_event, the information about these contexts will be in a separate field of the event for ‘custom context’, and the value will be a self-describing JSON (specifying both the schema for the context, and the data that must validate against this schema).

It’s important to understand that a self-describing event always has all the ‘atomic’ properties as well. It can be represented very approximately as something like:


Map(
  "app_name" -> "my_app",
  "event_name" -> "my_event_name",
  "unstruct_event" -> 
    {
      "schema": "iglu:com.acme/my_event_name/jsonschema/1-0-0", 
      "data": {"key1": "value1"}
    }
)

To answer your first question, you would use pojo when your input is part of the ‘atomic’ fields, like app_id. And you will use json when it is part of one of the JSON fields, like unstruct_event or contexts.

On your second question, the field must specify where to look for the JSON. It must be one of unstruct_event , contexts (sent by the tracker) or derived_contexts (added by another enrichment).

The schemaCriterion must specify the schema for the event or context where the input is coming from.

For example, you might have the following inputs (matching the example above):

{
  "key": "app_name",
  "pojo": {
    "field": "app_name"
  }
},
{
  "key": "key1",
  "json": {
    "field": "unstruct_event",
    "schemaCriterion": "iglu:com.acme/my_event_name/jsonschema/1-0-0",
    "jsonPath": "$.key1"
  }
}

Finally, all outputs will be attached to the event, and they will go in the derived_contexts field. Like unstruct_event and contexts, this field contains self-describing JSONs, so you will need to write one or more schema(s) for the output (depending on whether you want to attached all the data returned from the API as a single context or as multiple contexts).

You might also find this tutorial helpful, especially steps 2 and 3.

Topic		Replies	Views
API request enrichment never reaches endpoint Enrichment	3	1385	February 12, 2020
Can we make POST request with JSON body	3	1161	February 2, 2021
Passing API Key for Custom API enrichment Enrichment	10	1224	December 7, 2022
Multiple API Enrichments Enrichment	3	1626	January 4, 2017
Unable to pass Snowplow input data into API Request enrichment Enrichment	1	1444	May 21, 2018

API Request Enrichment with AWS

Related topics