API Request Enrichment with AWS

eileen_dover · August 6, 2022, 12:22am

Hello! I’m new to working with both Snowplow and APIs, and I just had a few questions regarding the set-up for API request enrichments.

My goal is to enrich each event being sent from an app with the app metadata. I currently have a REST API on AWS API Gateway that takes GET requests and returns a JSON object literal that looks like the following:

{"Item":{"json":"{ \"endpoint\": \"external\", \"team\": \"my_team\" }","app_name":"my_app"}}

This data exists in a DynamoDB table where the key is the app_name. My goal is that, when any event (custom structured events, page views, page pings, etc.) is being fired from a specific app (e.g. my_app), I could call the API with the app name as the key, and this information could be appended onto that event. I’ve looked through the tutorial here and have read through some forum posts on this site for reference, but I am having trouble understanding how to set up the configurations:

In editing the api_request_enrichment_config.json file in Phase 3, I had the following questions:

When would I use pojo vs json? Is this different depending on the event I’d want to be enriching (e.g. page view vs custom structured events)?
If I do use json, what is meant to go in field and schemaCriterion? Particularly for events like page views that don’t seem to follow a specific schema?
For outputs, does the outputted information get automatically appended to the Snowplow event being tracked? It seemed like in the tutorial that the returned data from the API was stored in a separate table in clearbit, but I was wondering if there was any way to attach that data to the payload of a page view/custom structured event? Is this something configured in the api_request_enrichment_config.json file? Or would it be an additional parameter when I call trackStructEvent?

Thanks so much in advance! I’m a little new to all of this, so I’m happy to clarify anything that didn’t make sense.

dilyan · August 8, 2022, 1:24pm

Hi @eileen_dover,

First, I’ll try to give you some context that will hopefully make it easier to understand my answers to your questions.

We can think of Snowplow events as being one of two types. We usually call them ‘atomic’ and ‘custom self-describing’ events (though you might see the latter also being called ‘custom unstructured events’). The main difference between these two types is that the self-describing events are always accompanied by information about the schema against which they must validate. Whereas the ‘atomic’ events do not come with a JSON schema to which they must conform.

If you think of a Snowplow event as a single line, it will have properties like app_name, event_name etc and they are associated with specific values. If it’s an ‘atomic’ event, it might only have these simple key-value pairs. But, if it’s a self-describing event, it will have an additional field, called unstruct_event. The value of this field will be a JSON with two important properties: schema which points to the JSON schema that the event must validate against; and data, which contains the actual event payload.

In addition, both types of events – atomic and self-describing – might have some custom contexts added to them by the tracker. Just like the unstruct_event, the information about these contexts will be in a separate field of the event for ‘custom context’, and the value will be a self-describing JSON (specifying both the schema for the context, and the data that must validate against this schema).

It’s important to understand that a self-describing event always has all the ‘atomic’ properties as well. It can be represented very approximately as something like:


Map(
  "app_name" -> "my_app",
  "event_name" -> "my_event_name",
  "unstruct_event" -> 
    {
      "schema": "iglu:com.acme/my_event_name/jsonschema/1-0-0", 
      "data": {"key1": "value1"}
    }
)

To answer your first question, you would use pojo when your input is part of the ‘atomic’ fields, like app_id. And you will use json when it is part of one of the JSON fields, like unstruct_event or contexts.

On your second question, the field must specify where to look for the JSON. It must be one of unstruct_event , contexts (sent by the tracker) or derived_contexts (added by another enrichment).

The schemaCriterion must specify the schema for the event or context where the input is coming from.

For example, you might have the following inputs (matching the example above):

{
  "key": "app_name",
  "pojo": {
    "field": "app_name"
  }
},
{
  "key": "key1",
  "json": {
    "field": "unstruct_event",
    "schemaCriterion": "iglu:com.acme/my_event_name/jsonschema/1-0-0",
    "jsonPath": "$.key1"
  }
}

Finally, all outputs will be attached to the event, and they will go in the derived_contexts field. Like unstruct_event and contexts, this field contains self-describing JSONs, so you will need to write one or more schema(s) for the output (depending on whether you want to attached all the data returned from the API as a single context or as multiple contexts).

You might also find this tutorial helpful, especially steps 2 and 3.

Topic		Replies	Views
API request enrichment never reaches endpoint Enrichment	3	1381	February 12, 2020
Can we make POST request with JSON body	3	1160	February 2, 2021
Passing API Key for Custom API enrichment Enrichment	10	1220	December 7, 2022
Multiple API Enrichments Enrichment	3	1626	January 4, 2017
Unable to pass Snowplow input data into API Request enrichment Enrichment	1	1444	May 21, 2018

API Request Enrichment with AWS

Related topics