Visualising snowplow data

We have a lot of data (page views, getting custom events). We have Periscope as our main BI tool.
But PM and designers are now used to have Amplitude, Mixpanel or any other Product Analytics tools that give them self serving capabilities and fast exploration UI. I am curious of what other teams do and which tool they are using to visualise their data. Problems we would like to solve would be:

funnel visualisation, retention cohort, user path, and basic timeseries chart per event. With the abilitty of segmenting users based on user properties…

I know that there was a topic about indicative. But I wonder if some other data team have other experience to share. And what the snowplow team who has obviously heard of these problems could recommend/share!


1 Like

I guess your organization’s problem is that different teams have their preferred BI/analytics/data exploration tools. While Periscope may serve other departments’ purposes, it may not satisfy your PMs who are more tech-savvy and would want to dig deeper into the data instead of looking at just the aggregated report.

We are also a product company, and the most important things for us are:

  1. The PMs have access to the correct data from a common source of truth (instead of using their dedicated tool and explore unverified data)
  2. The PMs have a convenient means to explore that data by themselves (instead of having to rely on the data analytics team all the time).

In may cases, if you enforce (1) your PMs won’t be happy because they cannot freely explore data by themselves. If you let them do (2) then you face data accuracy issues.

To solve those problems we use Holistics on top of BigQuery and Postgres for all internal reports and data self-service. (Imagine it as dbt + a nice GUI + data exploration layer for business users)

Holistics has a really useful data modeling layer and a data exploration layer built right on top of it. We can create some models as building blocks and combine them in different ways to serve different needs.

For example, we aggregate traffic to (session - page) level, (session) level, (page) level, (visitor) level… and mix them to create datasets for different purposes like:

  • Sales funnel analysis dataset = session model + session-page model + demo booking model
  • User path = session model + session-page model + page model + visitor model

Same thing with user behavior on our products. We build models for major features, and either throw them together into a master dataset, or break them into feature groups so the PMs have an easier time getting their relevant data.

It’s important that we model our data carefully and expose only relevant dimensions and measures. The amount of data you got from Snowplow is enormous in terms of volume and dimensions, so we have to balance between flexibility in exploration and performance.

1 Like

I wish someone built a self-hosted Amplitude-like UI for snowplow or any self-hosted event data.

PMs would love the ability to easily compare cohort retention, build funnels, account/user level segmentations. All without the need to build SQL queries or choose measures/dimensions in BI tools. I’m also looking for a tool like that and couldn’t find any good ones so far.

Seems like there are two most popular options now:

  1. Go @hoanghapham’s way of preparing tables for consumption in BI tools.
  2. Anonymize event data completely and send it to cloud tools. But you can lose all the segmentation benefits, if you don’t send all the user properties.

I would be quite interested to see your page level datasets for example. I see holistic having the same problem as Periscope: that is cool to define models, for key reporting, but I think it limits the freedom of PMs or that put a lot of burden on the data team. But I might get wrong !

Do you think we can have a call so that you can show me a few examples of what you did ?

We can certainly do that @mpeychet. To make things clear I am a data analyst at Holistics.

We are trying to improve collaboration between the data team and other departments, so I’m curious about how other product and data analytics teams cooperate. It would be great if you can share some of your experiences - we may come across some problems that we have never thought of before!

If you are interested, let’s connect on linkedin and we can arrange a call.


Ha Pham