Page_id crosses sessions and over inflates time

Colm · July 22, 2021, 10:33am

I don’t think I have much to add to what Mike and Paul have mentioned re: the tracker, so I’ll focus on a practical approach to your modeling.

I can see two ways to work around the problem you’ve outlined:

Adjust your query to group by session_id when you perform your DATEDIFF for engaged time, then later make sure the correct row is used when you join it to your final table (I would probably perform this aggregation separately to others due to increased risk of duplicates). Note that I would name this field absolute_time - since the engaged time metric only registers time spent active on the page (We have both absolute time and engaged time in the latest models). This will make sure that ‘stray’ pings will be disregarded.
Adjust the ping aggregation along the following lines:

Page Ping callback updates an object which counts distinct pings triggered
Global context (or manually tracked context) is attached to all relevant events with the latest count of pings. This would mean that each event has a context which tells you the most up to date ping count when that event happened. Let’s call that ping_count
A similar query is used in modeling, except rather than difference between timestamps, the metric you now use is heartbeat*MAX(ping_count) (assuming that heartbeat == minimumVisitLength)

This way, you can count stray pings on the original page view if you like, or you can combine both.

At least in my mind this works around the issue you’re having - wdyt?

Topic		Replies	Views
Page pings missing For data modelers & consumers	4	1145	September 6, 2019
Session ID not resetting Tracking SDKs	3	2191	January 24, 2018
Concurrent Web sessions for the same user For data modelers & consumers	6	1718	December 23, 2021
Relationship between session, "pageview ID" and page_ping event For data modelers & consumers	1	1284	December 25, 2019
Unique Pageview in a Session & Engaged Pageviews? For data modelers & consumers	2	1629	February 21, 2018