Sending data into Snowplow

Sending data into Snowplow

Data is sent into Snowplow by trackers and webhooks.

  1. Trackers
  2. Webhooks

1. Trackers

Snowplow is built so that you can send in event level data from any type of digital application, service or device. There is a wide range of Snowplow trackers built to enable you to easily collect event-level data from lots of different places.

When you instrument a Snowplow tracker you need to set it up in such a way that it responds to the events you need to track by:

  1. Seeing those events
  2. Assembling a packet of data points that fully describe those events
  3. Sending the packet of data representing that event to the Snowplow collector for processing

This process looks a bit different depending on the tracker you’re implementing. However, the underlying process is the same in all cases - the main thing to consider is - is the event you’re tracking one that you have defined, or that has been defined already in Snowplow?

Tracking events that have already been defined in Snowplow

Events that are supported by Snowplow out-of-the-box

Snowplow supports a large and growing number of events ‘out of the box’, most of which are fairly standard in a web analytics context. Examples of events that we support include:

  • Page views
  • Page pings
  • Link clicks
  • Form fill-ins (for the web)
  • Form submissions
  • Transactions

For events that Snowplow natively supports, there is generally a specific API for tracking that event type in Snowplow. For example, if you want to track a page view using the Javascript tracker, you do so with the following Javascript:

window.snowplow('trackPageView');

Whereas if you were tracking a pageview in an iOS app using the objective-c tracker, you’d do so like this:

[t1 trackPageView:@"www.example.com" title:@"example" referrer:@"www.referrer.com"];

In general, each tracker will have a specific API call for tracking any events that have been defined by the Snowplow team, and you should refer to the tracker-specific documentation to make sure that this is set up correctly.

Contexts that are supported by Snowplow out-of-the-box

Wherever possible, we try and build the trackers to automatically capture as much contextual data for each event as possible. For example, with the Javascript tracker, we automatically capture the following data fields with every request unless they are disabled:

Field & Description
dvce_tstamp The timestamp on the device that the event was recorded on
os_timezone The timezone the client operating system is set to
event_id A unique identifier for the event
domain_userid First party cookie ID
domain_sessionidx Session index based on first party cookie ID
dvce_screenheight Screen width in pixels
br_viewwidth Browser view width in pixels
br_viewheight Browser view height in pixels
page_url URL of the page on which the event occurred
page_referrer URL of the referrer
user_fingerprint Browser fingerprint
br_lang Language the browser is set to
br_features_... A list of boolean flags to indicate if common plugins are installed e.g. PDF, Quicktime, RealPlayer, Flash, Java…
br_colordepth Browser color depth
doc_width Width of webpage in pixels
doc_height Height of webpage in pixels
doc_charset Document encoding
platform The platform that the event was recorded on, in this case ‘web’
name_tracker The tracker name
v_tracker The tracker version

In addition to the above fields, there are a number of additional optional contexts that you can capture automatically using the Snowplow Javascript tracker, including:

The mobile (iOS and Android trackers) also automatically capture a large number of data points with every event, where available:

Field & Description
os_type Operating system type
os_version Operating system version
device_manufacturer The device manufacturer
device_model The device model
carrier The mobile carrier
apple_idfa Apple’s IDFA (ID for advertisers)
open_idfa The open IDFA
android_idfa The Android IDFA
latitude Device location latitude
longitude Device location longitude
latitude_longitude_location_accuracy The accuracy of the lat/long measures above
altitude Device location altitude
altitude_accuracy The accuracy fo the above altitude measure
bearing Direction of device travel
speed Speed with which the device is travelling

Tracking events that you’ve defined yourself

Tracking events where you have defined the schema yourself is straightforward. Before you instrument your tracker, you need to:

  1. Make sure you have your Iglu schema repo setup
  2. Create a schema for your event type in the repo
  3. Have the associated reference to the schema in Iglu. So for example, if your company website URL is mycompany.com, and you’ve defined your own outbound-call-made event schema, and it is the first version of that schema, then the reference to the schema is iglu:com.mycompany/outbound-call-made/jsonschema/1-0-0

Once that is done you simply need to configure your tracker to record the event using the track unstructured event method. So if we were tracking the event using the Python tracker, our code snippet for doing so might look like this:

tracker.track_unstruct_event({
    "schema": "iglu:com.mycompany/outbound-call-made/jsonschema/1-0-0",
    "data": {
        "connected_tstamp": "2015-03-21 17:23:10",
        "disconnected_tstamp": "2015-03-21 17:48:21",
        "reason_for_call": "Response to interest submitted via webform",
        "success": true,
        "order_id": "ab-1903-23904",
        "order_value": "129.44"
    }
})

We call the track unstructured event method and pass in a JSON with two fields, a schema field, which tells Snowplow where the schema for this event can be located in Iglu, and a data field, that includes that actual data that needs to be captured. We call this a self-describing JSON, because assuming we have access to Iglu, the JSON contains all the information we need to process it, in the form of the schema.

Each of the Snowplow trackers includes a track unstructured event method and it is not uncommon to have Snowplow implementation where nearly all if not all the events tracked have been defined by the company in question, and so are all tracked using this method.

Tracking contexts that you’ve defined yourself

Whenever you track any event in Snowplow, using any tracker, you can pass into Snowplow as many contexts as you want. This gives you the flexibility to pass potentially enormous amounts of data with each event that you capture.

Across all our trackers, the approach is the same. Each context is a self-describing JSON. We create an array of all the different contexts that we wish to pass into Snowplow, and then we pass those contexts in generally as the final argument on any track method that we call to capture the event. (E.g. track pageview, track structured event, track unstructured event etc.) So for example, we can extend the example above to pass in a user and product context:

tracker.track_unstruct_event({
    "schema": "iglu:com.mycompany/outbound-call-made/jsonschema/1-0-0",
    "data": {
        "connected_tstamp": "2015-03-21 17:23:10",
        "disconnected_tstamp": "2015-03-21 17:48:21",
        "reason_for_call": "Response to interest submitted via webform",
        "success": true,
        "order_id": "ab-1903-23904",
        "order_value": "129.44"
    }
}, context=[{
    "schema": "iglu:com.mycompany/customer/jsonschema/1-0-0",
    "data": {
        "name": "Joe Bloggs",
        "address_street": "123 ABC Road",
        "address_town": "my town",
        "address_state": "my state",
        "address_country": "United States of America"
    }
},{
    "schema": "iglu:com.mycompany/product/jsonschema/1-0-0",
    "data": {
        "sku": "1908asdf",
        "name": "product name",
        "list_price": "149.99",
        "discounted_price": "129.44",
        "promotion": "end of season",
        "color": "red"
    }
}])

2. Webhooks

A number of third party systems offer webhooks: the ability to stream event data to an end point of your choosing, as those events occur in the third party system. At Snowplow, we’re working to integrate as many different third party webhooks as possible, so that if you use those services, you can configure them to push event-level data directly into Snowplow.

Configuring a third party service to stream event-level data into Snowplow is straightforward - it is generally something you do once, via the application UI. For details on the different webhooks that Snowplow supports and instructions on integrating them, see the setup guide.