We sometimes get the following question:
What are all possible values of the refr_medium
field?
The possible values are:
NULL
internal
social
search
email
unknown
It’s NULL
when the page view has no referrer.
It’s set to internal
when the page URL and referrer URL have the same host, or when the referring domain is configured as internal: https://github.com/snowplow/snowplow/blob/master/3-enrich/config/enrichments/referer_parser.json
The other values are parsed according to the data in our referer-parsing project: https://github.com/snowplow/referer-parser/blob/master/resources/referers.yml
To find out which referrer mediums are most common, run:
SELECT
refr_medium,
count(*)
FROM atomic.events
GROUP BY 1
ORDER BY 2 DESC
mike
July 4, 2016, 11:18pm
2
If there’s something missing from the referrers YAML file you can also edit that file, it’s in a relatively simple indented/hierarchical format - https://github.com/snowplow/referer-parser/blob/master/resources/referers.yml and submit a pull request if there’s an additional referrer that you think others would benefit from having defined.
1 Like
# #######################################################################################################
#
# ALL SUPPORTED REFERERS
#
# Broken down into:
#
# 1. Medium-unknown providers
# 2. Email providers
# 3. Social providers
# 4. Search providers
# 5. Paid media
# #######################################################################################################
#
# MEDIUM-UNKNOWN PROVIDERS
#
# We know the source, but not the medium.
# This section is useful for reducing false positives in the other sections
This file has been truncated. show original
I check the yml file above, for the following domains, they are classified as ‘paid’ medium:
“paid.outbrain.com ”
“trc.taboola.com ”
But in my record it’s able to identify the source correctly (Outbrain and Taboola), but the medium is identified as ‘unknown’ instead of ‘paid’.
Is this a normal behavior?
Thank you.
The latest version of the referers.yml is not in Snowplow enrichment yet. It uses an older version of this file before paid media were added.
Since those sites aren’t in the yaml, they will just be grouped into unknown.
ihor
November 22, 2018, 10:34pm
5
I’ll raise internal ticket to check if/why the updated file is not hosted.
Cheers @ihor - I believe they’re looking at decoupling the yaml source from the enrichment pipeline as part of the next release:
This will mean anyone can use whatever version of the referer yaml they like (e.g. including the latest version or entirely custom versions).
thanks for the explanation.
ihor
November 23, 2018, 9:18pm
10
@robkingston , yes, you are absolutely right.