Hello! At my company, we use the dbt labs Snowplow sessionization models to combine the page_view and page_ping events triggered on our website into a notion of Web sessions:
At the heart of these models lie the domain_userid and domain_sessionid fields.
For a small subset of records (~2-3% of all sessions), I see the same domain_userid generating concurrent/overlapping Web sessions (i.e. multiple values of domain_sessionid with overlapping start/stop times for the same domain_userid).
Is this to be expected? Are there known scenarios where the same user can generate two overlapping sessions?
This isn’t to be expected but I have seen it happen in the past. One driver for this can be ‘stray page pings’.
Such pings are generated a long time after the initial page view. The theory is that a user opens a page (day 0), then leaves the tab dormant for an extended period of time. When the tab is viewed again another page ping is fired (day 1). This page ping has the same page_view_id as the original page view but a different session_id, due to the large period of inactivity.
However because the model aggregates pings on page_view_id and then joins this back to the original page view, the original page view and therefore session appears to have been going on much longer than it actually was. If a user were to have accessed your site via a second tab on day 1, you could now potentially have the day 0 session appearing to overlap with this new session.
As I said I have seen this in the past when modelling Snowplow data. I am not that familiar with dbt-labs snowplow package but from initial inspection this scenario seems possible given the implementation. I guess one way to check if this is the root cause would be to check if the original overlapping session has a particularly long page view in terms of the delta between max_tstamp and min_tstsamp, but a small time_engaged_in_s (due to the large period of inactivity) using the data in this model.
Hope that makes sense and let me know how you get on. Could well be down to something else but I think this a good first check.
I took a closer look at the overlapping sessions for a few different domain_userid values and they tend to involve one long-running session (sometimes spanning a few days to a week) and multiple shorter sessions that do not overlap with one another. This seems to me to be consistent with the behavior of leaving a tab open in the background while browsing more actively on the same website in foreground tabs.
Thank you! I’ll have to figure out how/if we want to address this in data modeling.
Hmm, well actually, I think I am confused once again.
Based on this documentation, I would think that the Javascript tracker should generate a new domain_sessionid when the user re-focuses on a tab that has been in the background for over 30 minutes (and they have not generated any other activity on my website in other tabs):
" Whenever an event is fired, the session cookie is set to expire in 30 minutes. (This value can be altered using setSessionCookieTimeout )
If no session cookie is already present when an event fires, the tracker treats this as an indication that long enough has passed since the user last visited that this session should be treated as a new session rather than a continuation of the previous session."
@Will In the situation you described, shouldn’t the most recent page_ping for a long-dormant tab have a new domain_sessionid?
So yes you are correct the most recent page ping will have a different domain_sessionid to the original page view. An example of what this may look like:
page_view_id
event_name
session_id
tstamp
1
page_view
x
2021-01-01
1
page_ping
y
2021-01-02
If you imagine running this data through the snowplow_web_events_time model, the CTE here will aggregate the page view and ping under the same page_view_id, irrespective of the domain_sessionid.
This table then gets joined back on page_view_id to the original page view, with session_id = x, in the page views model to get the start and end tstamp of the page view. This mean the original page view and therefore session appears very long in length despite the fact that the subsequent page ping has a different session_id. Hope that makes sense.
There is a more detailed post about this behaviour here. It might give you some ideas on how to handle this situation in the model.