Campaign tracking with Snowplow
In one of our blogs a few years back we raised an awareness of the complexity the web analyst face when trying to answer the questions like:
- Which sites and marketing campaigns are driving visitors to your website?
- How valuable are those visitors?
- What should you be doing to drive up the number of high-quality users?
We pointed out the importance of examining both page URL and referer URL. For this purpose, we introduced the corresponding
In this tutorial, we are going to introduce the Snowplow practical approach to addressing this problem when it comes to web driven traffic. If you are interested in tracking mobile driven campaigns, please, refer to the tutorial (in 2 parts) listed below:
- Integrating Adjust with Snowplow to add mobile attribution data to the rest of your event data (1/2)
- Integrating Adjust with Snowplow to add mobile attribution data to the rest of your event data (2/2)
What is a referer?
When you load a web page in your browser, the browser makes an HTTP request to a web server to deliver that page. That request includes a header field that identifies the address of the web page that linked to the resource being requested: this is called the HTTP referer.
Web analytics programs typically read the HTTP referer header or JavaScript’s document.referrer
, and use that page referer data as one the inputs to infer where a visitor has come from.
Note that we normally use the original HTTP misspelling of “referer” as opposed to “referrer”.
Here’s an example of HTTP request:
GET https://www.properweb.ca/ HTTP/1.1
Host: www.properweb.ca
Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: https://www.google.ca/
Accept-Encoding: gzip, deflate, sdch, br
Accept-Language: en-US,en;q=0.8,ru;q=0.6
Cookie: PHPSESSID=dlq83b1gj0hqomib74ibkpovi7; sc_is_visitor_unique=rx10185912.1470435733.4A1DC2BF0E334F4118E9003B3BE41D25.3.3.3.3.3.3.3.3.3
The Referer
parameter references the source of the request to GET
the Host
’s web page. In the example above, we can see the request came from Google. In this form, it is of little help. Well, we all know what Google is but what if the request came from other (less known) source?
We want to know more about it and Refer Parser Enrichment can help us.
Before going into the details of the enrichment internals I would like to remind you that Snowplow is able to extract the value from the Referer
header field. It’s stored in page_referrer
column of atomic.events
table. Moreover, the referer page URL is further atomized into comprising parts populating the following columns of the atomic.events
table:
refr_urlscheme
- the protocol (ex.http
,ftp
)refr_urlhost
- the host of the web server (domain name)refr_urlport
- port of the server to obtain the resource (ex.80
)refr_urlpath
- path to the documentrefr_urlquery
- querystring of the referer URLrefr_urlfragment
- identifier of the page section following#
in the URL of the document
Referer Parser Enrichment
If Refer Parser Enrichment is enabled the referer is further examined and compared against the database referers.yml
. The database itself contains 4 sections representing the medium:
unknown
- for when we know the source, but not the mediumemail
- for webmail providerssocial
- for social media servicessearch
- for search engines
Additionally the referer page domain name (value for refr_urlhost
in atomic.events
table) is used to determine if this is an internal referer (that is the request came from within own network) by comparing it with the domain names extracted from internalDomains
parameter of referer_parser.json.
As a result, the referer page URL dimension widens by populating additional columns of atomic.events
as outlined below.
refr_medium
- Type of referer (ex. ‘search’, ‘internal’)refr_source
- Name of referer if recognisedrefr_term
- Keywords if source is a search engine
NOTE: Since Google started encrypting the search terms it is not possible to infer them from the referer URL. Google strips the search query information from the “q=” (q=search+query) parameter in its referer string.
Enabling referer enrichment
The Referer Parser is a member of so-called configurable enrichments provided by Snowplow. It is easy to use. First, we need to prepare the enrichment referer_parser.json configuration file.
{
"schema": "iglu:com.snowplowanalytics.snowplow/referer_parser/jsonschema/1-0-0",
"data": {
"name": "referer_parser",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"internalDomains": []
}
}
}
To distinguish the internal referers, you can add the list of your network domain names providing the link to the resource. Whenever the refr_urlhost
value matches the domain name from the internalDomains
list the column refr_medium
of atomic.events
table will be populated with “internal”.
...
"parameters": {
"internalDomains": [
"blog.properweb.ca",
"shop.properweb.com",
"www.properaffiliate.co.uk"
]
}
...
Add the configuration file to your “enrichments” folder (or whatever name you came up with) and run the EMRETLRunner with the --enrichments
parameter:
$ ./snowplow-emr-etl-runner --config config.yml --resolver resolver.json --enrichments enrichments
This is all to it. The enrichment process will take care of dimension widening your events.
Campaign Attribution Enrichment
Page referers are a technical solution to identifying where traffic comes from. In addition, digital marketers may want to label incoming traffic so that they can identify which marketing campaigns that traffic should be attributed to. This is typically done by adding a querystring to the landing page URL.
In other words, you have to build the link leading to your resource.
To give an example, let’s imagine that I am marketing the website www.properweb.ca. I run a campaign on AdWords called “September sale”. In my AdWords ad, I include a link (that I hope viewers of the ad will click) to my domain names webpage. However, instead of just including the standard link in my ad, i.e.
<a href="https://www.properweb.ca/domain-names/">domain names discount</a>
I add a query parameter onto the end of my link labelling the campaign:
<a href="https://www.properweb.ca/domain-names/?utm_campaign=September%20sale">www.properweb.ca/domain-names/</a>
Adding the query parameter does not change the experience of the user clicking on the ad. Then, on the landing page (in this case, the www.properweb.ca/domain-names/
web page) the web analytics JavaScript tag will pass the querystring to Snowplow, which can then infer that the traffic should be attributed to the “September sale”.
Different web analytics programs look for different query parameters when assigning traffic to different marketing campaigns. We follow the same naming convention deployed by Google Analytics, which makes an easy transition from the latter. The below summarises the parameters:
utm_medium
- The advertising or marketing medium, for example, cpc, banner, email newsletter.utm_source
- Identifies the advertiser, site, publication, etc. that is sending traffic to your resource.utm_term
- Identifies the search terms that triggered the ad being displayed in the search results.utm_content
- Used to differentiate similar content, or links within the same ad. For example, if you have two call-to-action links within the same email message, you can use utm_content and set different values for each so you can tell which version is more effective.utm_campaign
- The individual campaign name, slogan, promo code, etc. for a product.
Additionally, we introduced mkt_clickid
which serves as a tracking parameter identifying the marketing network. The enrichment automatically knows about Google (corresponding to the “gclid” querystring parameter), Microsoft (“msclkid”), and DoubleClick (“dclid”). However, you can add your own identifier (key) giving the name of your desired network as an attribute (value).
Enabling compaign attribution enrichment
Similarly to referer parser enrichment, we have to add campaign_attribution.json configuration file to the directory holding all our configurable enrichments. By doing so, you enable Compaign Attribution Enrichment.
Below is an example:
{
"schema": "iglu:com.snowplowanalytics.snowplow/campaign_attribution/jsonschema/1-0-1",
"data": {
"name": "campaign_attribution",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": false,
"parameters": {
"mapping": "static",
"fields": {
"mktMedium": ["utm_medium", "medium"],
"mktSource": ["utm_source", "source"],
"mktTerm": ["utm_term", "legacy_term"],
"mktContent": ["utm_content"],
"mktCampaign": ["utm_campaign", "cid", "legacy_campaign"],
"mktClickId": {
"customclid": "My Network"
}
}
}
}
}
Note that the actual parameter included in the page URL could be of arbitrary name. That is you might combine the campaign attribution provided by different analytics platforms or campaign managing tools.
Thus, (from the example above) the marketing campaign could be inferred from any of the three parameters in the page querystring: utm_campaign
, cid
, or legacy_campaign
. If more than one encountered the first one takes precedence.
Therefore, campaign_attribution.json
could be viewed as a mapping means between the parameters submitted with the querystring and the correponding columns in atomic.events
table. Specifically, the following describes the relationship:
mkt_medium
←mktMedium
mkt_source
←mktSource
mkt_term
←mktTerm
mkt_content
←mktContent
mkt_campaign
←mktCampaign
mkt_clickid
(key) &mkt_network
(value) ←mktClickId
Further reading
Conclusion
We expose both page_url
and page_referrer
. The data in mkt_
columns reflects the tagged (paid) campaign as opposed to organic/search activities. It answers the questions about which marketing campaigns the traffic should be attributed to. The data in refr_
columns, on the other hand, indicates where the traffic comes from. By combining the analysis of data in both mkt_
and refr_
columns:
- It leads to more intelligent and robust inferences about where you traffic comes from
- It identifies surprising results related to the placement of your paid campaigns, which may have significant implications for your overall marketing strategy.
- It makes it possible to identify and manage errors that are invariably introduced in the data
Having said that it is up to you how to combine the mkt_
and refr_
fields together. This is different to e.g. Google Analytics approach, that will combine them directly, setting the value of the medium, source, term etc. based on the utm_
parameters if available, and the refr_
parameters if not.