Visualization tools and approaches

What tools are people using to visualize and distribute their Snowplow data? What do you like about the tool you use and what do you wish it did better?

Internally at Snowplow we’ve recently moved from Redash to Preset (the commercial version of Superset).

I know @Matti is using Superset very extensively. Are you able to share some of the tricks you’ve learned for working with Snowplow data?

Superset:

Redash:

Hi Simon and everybody else!

Indeed we are using Superset for visualizing Snowplow data on a scale of ca. 500 users (most of which active).
In general, I believe it works great, but there are some areas where Superset does not quite match Web Analytics requirements:

  • You need some SQL-knowledge to create charts that go beyond the basics (it is debatable whether this is good or bad thing)
  • Some visualizations simply don’t exist or do not work well, ie:
    – Event Flow Charts
    – Funnel Charts
    Esp, the Event Flow charts are probably hard to integrate, since i feel these will be tricky to do with the standard Snowplow data model (but I am no expert in this)
  • Segmentation, meaning something like saveable sub-queries don’t exist. If you know how to do it (SQL needed again). you can achieve segmentation, but this is not as easy as in Web Analytics tools.
  • Since we have very large amounts of data (billions of rows), stability becomes an issue: Superset needs to keep the connections to the database open to receive the data, and if the queries take too long, this impacts overall performance. We are tweaking the settings consistently.
  • Emailing of Dashboard and Charts: Works well with some adjustments, but is not as nice as we would like to have it.

Other shortcomings exist, like (still) no PDF export option, some general bugs etc.

Now, I don’t want to sounds to negative, so some positive aspects:

  • It is completely open-source with a very active commmunity
  • We don’t need to worry about licensing costs
  • Supersets creates really nice charts
  • Since calculation is mostly passed to the underlying database, there are no memory problems on the client or similar

I would be very interested to hear about experiences with other tools and their pros and cons. Also, I’d be glad to discuss Superset and/or answer questions. :slight_smile:

1 Like

People have played with Snowplow data in neo4j in the past, which seems like the right approach for resolving the links between events and making that queryable in a sustainable way. Most recently at Snowplow here and more recently by a neo4j engineer.

But the visualisation side is equally hard.

Would love to hear how people have tackled this.