We all want our websites to be fast, especially now that mobile has become the dominant platform. Research by Google has shown that speed does indeed matter. When pages take a long time to load, users will get frustrated and leave. However, measuring and troubleshooting page load times is often easier said than done. The application design and browser version matter, but so do things that are harder to control, such as unexpected server load, poor network conditions, and unique client-side environments.
With Snowplow, it’s possible to measure how a website performs in the real world. Our Javascript tracker uses the Navigation Timing API to capture 23 attributes that break the loading process down into a set of milestones. This data is stored in the Performance Timing context, which gets sent with all events that happen on the page.
In this post, I’ll explain how to use Navigation Timing data.
Understanding the Navigation Timing API
The Navigation Timing API is supported by all major browsers and records the time when certain milestones in the navigation and load process occur. The various milestones are illustrated in this graph:
Each attribute captures when a navigation milestone (e.g. when was the page requested?) or page load milestone (e.g. when did the DOM start loading?) occurred. Note that:
- time is measured in milliseconds since midnight of 1 January 1970 (UTC);
- attributes are set to zero until the corresponding milestone has occurred;
- browsers might do additional internal processing between milestones.
Enabling the Performance Timing context
The Javascript tracker has the option to store Navigation Timing data in a predefined context. You will need to enable the Performance Timing context, and we also recommend enabling the Web Page context:
'contexts': {
'webPage': true,
'performanceTiming': true
}
Make sure to also create the corresponding tables in Redshift.
If the Performance Timing context is enabled, the JavaScript Tracker will create a context JSON from the window.performance.timing
object, along with the Chrome firstPaintTime
field (renamed to chromeFirstPaint
) if it exists.
The redirectStart
, redirectEnd
, and secureConnectionStart
attributes are set to zero if there is no redirect, or if a secure connection is not requested. Moreover, if you fire a page view event as soon as the page loads, the domComplete
, loadEventStart
, loadEventEnd
, and chromeFirstPaint
attributes might still be set to zero. This is because those properties are not known until all scripts are done executing, including sp.js
. To circumvent this limitation, one could wrap Snowplow (and other) code in a setTimeout
call:
setTimeout(function () {
// Load Snowplow and call tracking methods here
}, 0);
This will, however, also reduce the number of events that are sent in. If a visitor leaves a page before it is done loading, then no events will have been sent and the visit will remain unrecorded.
Instead, we can use page pings events to complete the picture. The initial page view event might be missing some attributes, if the page was still loading, but the Performance Timing context is updated each time a page ping (or heartbeat) is sent. If a user remains on the page long enough for a page ping to be sent, we will receive the updated attributes. If not, we will still be able to infer where in the loading process the user left.
Sending page ping events gives us more data to work with, but it also increases overall event volumes and the number of requests that are made to the collector. It’s a trade-off, one that requires careful consideration. If measuring page performance is an important goal, we recommend these settings:
snowplow_name_here('enableActivityTracking', 5, 10);
The first page ping is sent 5 seconds after the page view event. Subsequent page pings are sent in 10 second intervals. If measuring page performance is less important, we recommend increasing these intervals.
Aggregating the data in Redshift
Let’s start with joining the relevant tables:
WITH basic AS (
SELECT
b.id, -- page view UUID requires the webPage context
a.derived_tstamp, -- requires JS tracker 2.6.0 or later
a.page_urlhost,
a.page_urlpath,
a.event,
-- dimensions
a.br_name,
a.dvce_ismobile,
-- performance timing
c.navigation_start,
c.redirect_start,
c.redirect_end,
c.fetch_start,
c.domain_lookup_start,
c.domain_lookup_end,
c.secure_connection_start,
c.connect_start,
c.connect_end,
c.request_start,
c.response_start,
c.response_end,
c.unload_event_start,
c.unload_event_end,
c.dom_loading,
c.dom_interactive,
c.dom_content_loaded_event_start,
c.dom_content_loaded_event_end,
c.dom_complete,
c.load_event_start,
c.load_event_end
FROM atomic.events AS a
INNER JOIN atomic.com_snowplowanalytics_snowplow_web_page_1 AS b
ON a.event_id = b.root_id
AND a.collector_tstamp = b.root_tstamp
INNER JOIN atomic.org_w3_performance_timing_1 AS c
ON a.event_id = c.root_id
AND a.collector_tstamp = c.root_tstamp
WHERE a.event IN ('page_view','page_ping')
AND a.br_type IN ('Browser', 'Browser (mobile)') -- exclude bots
-- remove unexpected values (affects about 1% of rows)
AND c.navigation_start IS NOT NULL AND c.navigation_start > 0
AND c.redirect_start IS NOT NULL -- zero is OK
AND c.redirect_end IS NOT NULL -- zero is OK
AND c.fetch_start IS NOT NULL AND c.fetch_start > 0
AND c.domain_lookup_start IS NOT NULL AND c.domain_lookup_start > 0
AND c.domain_lookup_end IS NOT NULL AND c.domain_lookup_end > 0
AND c.secure_connection_start IS NOT NULL AND c.secure_connection_start > 0
-- connect_start is either 0 or NULL
AND c.connect_end IS NOT NULL AND c.connect_end > 0
AND c.request_start IS NOT NULL AND c.request_start > 0
AND c.response_start IS NOT NULL AND c.response_start > 0
AND c.response_end IS NOT NULL AND c.response_end > 0 AND DATEDIFF(d, a.derived_tstamp, (TIMESTAMP 'epoch' + c.response_end/1000 * INTERVAL '1 second ')) < 365
AND c.unload_event_start IS NOT NULL AND DATEDIFF(d, a.derived_tstamp, (TIMESTAMP 'epoch' + c.unload_event_start/1000 * INTERVAL '1 second ')) < 365 -- zero is OK
AND c.unload_event_end IS NOT NULL AND DATEDIFF(d, a.derived_tstamp, (TIMESTAMP 'epoch' + c.unload_event_end/1000 * INTERVAL '1 second ')) < 365 -- zero is OK
AND c.dom_loading IS NOT NULL AND c.dom_loading > 0
AND c.dom_interactive IS NOT NULL AND c.dom_interactive > 0
AND c.dom_content_loaded_event_start IS NOT NULL AND c.dom_content_loaded_event_start > 0
AND c.dom_content_loaded_event_end IS NOT NULL AND c.dom_content_loaded_event_end > 0
AND c.dom_complete IS NOT NULL -- zero is OK
AND c.load_event_start IS NOT NULL -- zero is OK
AND c.load_event_end IS NOT NULL -- zero is OK
ORDER BY 1,2)
It’s recommended to add more dimensions. Examples include the ISP, location, and various browser and device related dimensions. For all available dimensions and measures, check out our canonical event model.
The next step is aggregating page view and page ping events (both are micro events) into page views (a macro event). This distinction is explained in more detail in this blogpost on event data modeling.
DROP TABLE IF EXISTS derived.performance_timing;
CREATE TABLE derived.performance_timing
DISTKEY(id)
SORTKEY(tstamp)
AS (
WITH basic AS (...)
SELECT
id,
MIN(derived_tstamp) AS tstamp,
page_urlhost,
page_urlpath,
SUM(CASE WHEN event = 'page_view' THEN 1 ELSE 0 END) AS pv_count,
SUM(CASE WHEN event = 'page_ping' THEN 1 ELSE 0 END) AS pp_count,
CASE
WHEN DATEDIFF(s, MIN(derived_tstamp), MAX(derived_tstamp)) < 10
THEN ROUND(DATEDIFF(s, MIN(derived_tstamp), MAX(derived_tstamp))/5)*5
ELSE ROUND(DATEDIFF(s, MIN(derived_tstamp), MAX(derived_tstamp))/10)*10
END AS time_on_page,
br_name,
dvce_ismobile,
-- select the first non-zero value
MIN(NULLIF(navigation_start, 0)) AS navigation_start,
MIN(NULLIF(redirect_start, 0)) AS redirect_start,
MIN(NULLIF(redirect_end, 0)) AS redirect_end,
MIN(NULLIF(fetch_start, 0)) AS fetch_start,
MIN(NULLIF(domain_lookup_start, 0)) AS domain_lookup_start,
MIN(NULLIF(domain_lookup_end, 0)) AS domain_lookup_end,
MIN(NULLIF(secure_connection_start, 0)) AS secure_connection_start,
MIN(NULLIF(connect_start, 0)) AS connect_start,
MIN(NULLIF(connect_end, 0)) AS connect_end,
MIN(NULLIF(request_start, 0)) AS request_start,
MIN(NULLIF(response_start, 0)) AS response_start,
MIN(NULLIF(response_end, 0)) AS response_end,
MIN(NULLIF(unload_event_start, 0)) AS unload_event_start,
MIN(NULLIF(unload_event_end, 0)) AS unload_event_end,
MIN(NULLIF(dom_loading, 0)) AS dom_loading,
MIN(NULLIF(dom_interactive, 0)) AS dom_interactive,
MIN(NULLIF(dom_content_loaded_event_start, 0)) AS dom_content_loaded_event_start,
MIN(NULLIF(dom_content_loaded_event_end, 0)) AS dom_content_loaded_event_end,
MIN(NULLIF(dom_complete, 0)) AS dom_complete,
MIN(NULLIF(load_event_start, 0)) AS load_event_start,
MIN(NULLIF(load_event_end, 0)) AS load_event_end
FROM basic
GROUP BY 1,3,4,8,9
ORDER BY 2
);
Note that:
-
unload_event_start
often occurs beforeresponse_end
(in ~ 30% of page views) -
dom_loading
sometimes starts beforeunload_event_end
(in ~ 10% of page views)
It’s also not guaranteed that other milestone timestamps will be in the correct order. The tracker captures the data it gets from the browser, but, in rare cases, a later milestone will have an earlier timestamp. This affects few page views (< 1%), so it’s enough to just be aware that this can happen.
The estimates given before are all for the Snowplow website.
Exploring the data
Let’s run some queries!
Percentage of page views that completed loading
SELECT
ROUND(SUM(CASE WHEN dom_complete > 0 THEN 1 ELSE 0 END)/COUNT(*)::FLOAT, 4) AS all,
ROUND(SUM(CASE WHEN dom_complete > 0 AND time_on_page >= 5 THEN 1 ELSE 0 END)/SUM(CASE WHEN time_on_page >= 5 THEN 1 ELSE 0 END)::FLOAT, 4) AS at_least_5s,
ROUND(SUM(CASE WHEN dom_complete > 0 AND time_on_page >= 10 THEN 1 ELSE 0 END)/SUM(CASE WHEN time_on_page >= 10 THEN 1 ELSE 0 END)::FLOAT, 4) AS at_least_10s
FROM derived.performance_timing
For the Snowplow website, this returns the following results:
- all pages: 70.62%
- at least 5 seconds: 99.25%
- at least 10 seconds: 99.36%
Looking at all page views, we find that the DOM completed loading in 70% of page views. If we restrict to page views where at least one page ping was sent (i.e. the user was on the page for at least 5 seconds), this number goes up to 99%.
An alternative would be to use load_event_end
rather than dom_complete
. This should return similar results.
Network latency
SELECT
response_end-fetch_start AS milliseconds,
SUM(CASE WHEN dvce_ismobile THEN 1 ELSE 0 END) AS mobile,
SUM(CASE WHEN dvce_ismobile THEN 0 ELSE 1 END) AS not_mobile,
COUNT(*) AS total
FROM derived.performance_timing
GROUP BY 1
HAVING response_end-fetch_start >= 0
ORDER BY 1
LIMIT 1000
Pages that take the longest to load (on average)
SELECT
page_urlpath,
AVG(load_event_end-response_end) AS avg_load_time_ms,
COUNT(*)
FROM derived.performance_timing
GROUP BY 1
ORDER BY 2 DESC