Page_id crosses sessions and over inflates time

I don’t think I have much to add to what Mike and Paul have mentioned re: the tracker, so I’ll focus on a practical approach to your modeling.

I can see two ways to work around the problem you’ve outlined:

  1. Adjust your query to group by session_id when you perform your DATEDIFF for engaged time, then later make sure the correct row is used when you join it to your final table (I would probably perform this aggregation separately to others due to increased risk of duplicates). Note that I would name this field absolute_time - since the engaged time metric only registers time spent active on the page (We have both absolute time and engaged time in the latest models). This will make sure that ‘stray’ pings will be disregarded.

  2. Adjust the ping aggregation along the following lines:

  • Page Ping callback updates an object which counts distinct pings triggered
  • Global context (or manually tracked context) is attached to all relevant events with the latest count of pings. This would mean that each event has a context which tells you the most up to date ping count when that event happened. Let’s call that ping_count
  • A similar query is used in modeling, except rather than difference between timestamps, the metric you now use is heartbeat*MAX(ping_count) (assuming that heartbeat == minimumVisitLength)

This way, you can count stray pings on the original page view if you like, or you can combine both.

At least in my mind this works around the issue you’re having - wdyt?