Modelling page view events as a graph – Snowplow

antman · August 13, 2018, 7:54pm

In the previous post in this series we started exploring options for modelling event data as a graph in general. We looked at three ways of modelling atomic event data:

This is a companion discussion topic for the original entry at https://snowplowanalytics.com/blog/2018/08/13/modelling-page-view-events-as-a-graph/

ernest · August 14, 2018, 7:37am

At Dripit we started out as a behavioral analytics company. Our first hypothesis was that there must be patterns which can be picked up in visitor data. And of course, we thought that graph representation could be an interesting beginning. The result was really messy and sloooow representation. When we did some data preparation and actually picked up sequential milestones, we were able to build a model which could predict in a real-time likelihood of conversion and context of a visit. At the time there were just a couple of graph databases and we ended up using HBase/Redis to store behavioral data.

In conclusion. In our case it looked like graph would be a perfect solution but it was way to easier to use a simpler data model and NoSQL databases to solve our problem. It was a good engineering exercise, nevertheless!

dilyan · August 14, 2018, 3:38pm

That is a great observation @ernest! And it’s something we’ve been thinking about as well.

When you say you end up with a ‘messy’ graph, do you mean aesthetically or does it have performance implications as well? In the experiment described in this post, I could very quickly see that extreme denormalisation – while ensuring you cater for the large majority of use cases – results in a graph whose visual representation is unintelligible. That is why there are not a lot of pictures in this post; and in the one that I included, I had to dramatically cut down the number of represented nodes.

But I wonder: is that messiness superficial or does it have implications for the end analysis.

We’ve definitely considered more narrow use cases, with prepped data; and future posts will expand on those.

ernest · August 15, 2018, 8:27am

Regarding messy graph. In our heads we had perception that there could be some sort of direct graph how people move towards “conversion” event. Something like Sankey diagram. But in reality there were really little overlapping tracks. There are tonns of unique ways how people get to that one event. And this has implication also on performance and the value of analysis. You can see how it looks like in GA path analysis. At first you are pumped (o boy, o boy path analysis). Then you see it and understand that there are people who exit page for pages where other people have come from.

We had to come up with a meta journey and aggregate the nodes. For example, rather then considering each unique page, we categorized them in Product pages (with parameters like price, time on page), Category pages, Info pages. Now the graph actually started to look like something but at this point we also saw that there was ability to have flat representation which is more suitable for predictive models.

Topic		Replies	Views
Building a model for event data as a graph – Snowplow From the blog	14	2316	March 11, 2021
Resolving entities with graph databases using Neo4j – Snowplow	0	965	February 13, 2019
An improved model of Pageviews? For data modelers & consumers	4	1918	April 26, 2017
What is event data modeling? For data modelers & consumers	0	1890	April 8, 2016
Event tracker with a very generic schema From the blog	0	1497	November 9, 2018

Modelling page view events as a graph – Snowplow

Related topics