Sending data into Snowplow
Data is sent into Snowplow by trackers and webhooks.
1. Trackers
Snowplow is built so that you can send in event level data from any type of digital application, service or device. There is a wide range of Snowplow trackers built to enable you to easily collect event-level data from lots of different places.
When you instrument a Snowplow tracker you need to set it up in such a way that it responds to the events you need to track by:
- Seeing those events
- Assembling a packet of data points that fully describe those events
- Sending the packet of data representing that event to the Snowplow collector for processing
This process looks a bit different depending on the tracker you’re implementing. However, the underlying process is the same in all cases - the main thing to consider is - is the event you’re tracking one that you have defined, or that has been defined already in Snowplow?
Tracking events that have already been defined in Snowplow
Events that are supported by Snowplow out-of-the-box
Snowplow supports a large and growing number of events ‘out of the box’, most of which are fairly standard in a web analytics context. Examples of events that we support include:
- Page views
- Page pings
- Link clicks
- Form fill-ins (for the web)
- Form submissions
- Transactions
For events that Snowplow natively supports, there is generally a specific API for tracking that event type in Snowplow. For example, if you want to track a page view using the Javascript tracker, you do so with the following Javascript:
window.snowplow('trackPageView');
Whereas if you were tracking a pageview in an iOS app using the objective-c tracker, you’d do so like this:
[t1 trackPageView:@"www.example.com" title:@"example" referrer:@"www.referrer.com"];
In general, each tracker will have a specific API call for tracking any events that have been defined by the Snowplow team, and you should refer to the tracker-specific documentation to make sure that this is set up correctly.
Contexts that are supported by Snowplow out-of-the-box
Wherever possible, we try and build the trackers to automatically capture as much contextual data for each event as possible. For example, with the Javascript tracker, we automatically capture the following data fields with every request unless they are disabled:
Field & Description
dvce_tstamp
The timestamp on the device that the event was recorded on
os_timezone
The timezone the client operating system is set to
event_id
A unique identifier for the event
domain_userid
First party cookie ID
domain_sessionidx
Session index based on first party cookie ID
dvce_screenheight
Screen width in pixels
br_viewwidth
Browser view width in pixels
br_viewheight
Browser view height in pixels
page_url
URL of the page on which the event occurred
page_referrer
URL of the referrer
user_fingerprint
Browser fingerprint
br_lang
Language the browser is set to
br_features_...
A list of boolean flags to indicate if common plugins are installed e.g. PDF, Quicktime, RealPlayer, Flash, Java…
br_colordepth
Browser color depth
doc_width
Width of webpage in pixels
doc_height
Height of webpage in pixels
doc_charset
Document encoding
platform
The platform that the event was recorded on, in this case ‘web’
name_tracker
The tracker name
v_tracker
The tracker version
In addition to the above fields, there are a number of additional optional contexts that you can capture automatically using the Snowplow Javascript tracker, including:
- Performance timing. This provides data on web page load times.
- Universal Analytics cookie data. This provides data read from the Google Analytics cookie, for users running Snowplow alongside Unviversal Analytics
- Geolocation context. This will provide data on where a user is, if that user has consented to give that information.
The mobile (iOS and Android trackers) also automatically capture a large number of data points with every event, where available:
Field & Description
os_type
Operating system type
os_version
Operating system version
device_manufacturer
The device manufacturer
device_model
The device model
carrier
The mobile carrier
apple_idfa
Apple’s IDFA (ID for advertisers)
open_idfa
The open IDFA
android_idfa
The Android IDFA
latitude
Device location latitude
longitude
Device location longitude
latitude_longitude_location_accuracy
The accuracy of the lat/long measures above
altitude
Device location altitude
altitude_accuracy
The accuracy fo the above altitude measure
bearing
Direction of device travel
speed
Speed with which the device is travelling
Tracking events that you’ve defined yourself
Tracking events where you have defined the schema yourself is straightforward. Before you instrument your tracker, you need to:
- Make sure you have your Iglu schema repo setup
- Create a schema for your event type in the repo
- Have the associated reference to the schema in Iglu. So for example, if your company website URL is
mycompany.com
, and you’ve defined your ownoutbound-call-made
event schema, and it is the first version of that schema, then the reference to the schema isiglu:com.mycompany/outbound-call-made/jsonschema/1-0-0
Once that is done you simply need to configure your tracker to record the event using the track unstructured event
method. So if we were tracking the event using the Python tracker, our code snippet for doing so might look like this:
tracker.track_unstruct_event({
"schema": "iglu:com.mycompany/outbound-call-made/jsonschema/1-0-0",
"data": {
"connected_tstamp": "2015-03-21 17:23:10",
"disconnected_tstamp": "2015-03-21 17:48:21",
"reason_for_call": "Response to interest submitted via webform",
"success": true,
"order_id": "ab-1903-23904",
"order_value": "129.44"
}
})
We call the track unstructured event method and pass in a JSON with two fields, a schema field, which tells Snowplow where the schema for this event can be located in Iglu, and a data field, that includes that actual data that needs to be captured. We call this a self-describing JSON, because assuming we have access to Iglu, the JSON contains all the information we need to process it, in the form of the schema.
Each of the Snowplow trackers includes a track unstructured event method and it is not uncommon to have Snowplow implementation where nearly all if not all the events tracked have been defined by the company in question, and so are all tracked using this method.
Tracking contexts that you’ve defined yourself
Whenever you track any event in Snowplow, using any tracker, you can pass into Snowplow as many contexts as you want. This gives you the flexibility to pass potentially enormous amounts of data with each event that you capture.
Across all our trackers, the approach is the same. Each context is a self-describing JSON. We create an array of all the different contexts that we wish to pass into Snowplow, and then we pass those contexts in generally as the final argument on any track method that we call to capture the event. (E.g. track pageview, track structured event, track unstructured event etc.) So for example, we can extend the example above to pass in a user and product context:
tracker.track_unstruct_event({
"schema": "iglu:com.mycompany/outbound-call-made/jsonschema/1-0-0",
"data": {
"connected_tstamp": "2015-03-21 17:23:10",
"disconnected_tstamp": "2015-03-21 17:48:21",
"reason_for_call": "Response to interest submitted via webform",
"success": true,
"order_id": "ab-1903-23904",
"order_value": "129.44"
}
}, context=[{
"schema": "iglu:com.mycompany/customer/jsonschema/1-0-0",
"data": {
"name": "Joe Bloggs",
"address_street": "123 ABC Road",
"address_town": "my town",
"address_state": "my state",
"address_country": "United States of America"
}
},{
"schema": "iglu:com.mycompany/product/jsonschema/1-0-0",
"data": {
"sku": "1908asdf",
"name": "product name",
"list_price": "149.99",
"discounted_price": "129.44",
"promotion": "end of season",
"color": "red"
}
}])
2. Webhooks
A number of third party systems offer webhooks: the ability to stream event data to an end point of your choosing, as those events occur in the third party system. At Snowplow, we’re working to integrate as many different third party webhooks as possible, so that if you use those services, you can configure them to push event-level data directly into Snowplow.
Configuring a third party service to stream event-level data into Snowplow is straightforward - it is generally something you do once, via the application UI. For details on the different webhooks that Snowplow supports and instructions on integrating them, see the setup guide.