Clickstream data datawarehousing guide

evaldas · May 14, 2019, 11:50am

I wrote a sizeable piece on clickstream data use cases, ownership and available options - paid, free, opensource. Any feedback would be very welcome.

A guide to data warehousing clickstream data

robkingston · May 14, 2019, 1:20pm

Wow, that’s a big post @evaldas - it’s past 23:00 here, so going to have to read after some shut-eye. For now, I skipped to the experiments side of things because that’s our specialty.

It frustrates me that SaaS testing tools (Optimizely, VWO etc) obscure so much of the useful data away from analysts at the expense of simplifying their products. But when you track experiments into your own DW, you open lots of great benefits up:

Assigning treatments to different units (do we only want to split traffic at the user level? How about splitting traffic at the product level or session level?) - you only have that freedom in tools like Snowplow
Total flexibility with the metrics you can measure & reports you can generate (your job is no longer defined by the capabilities of your SaaS testing vendor)
Data in your own DW may be more trustworthy than the figures out of a SaaS product
Better support for filtering out bots and other confounding data in your experiment

Companies invest tons of money into test development and tools - why not analyse these rich experiments with the most complete dataset available?

evaldas · May 15, 2019, 8:12am

@robkingston thanks for your great feedback. I share your thoughts in regards to A/B experiments. I have tried VWO in the past, if I remember they have fairly comprehensive UI and feature set, but when it comes to data as you said it was locked in their platform. So before as a workaround I’ve just added an extra context which was mapping to VWO js experiment var data. Though eventually we have stopped using it and just stuck to tracking experiments ourselves as it gives all the flexibility points as you mentioned especially joining to other datasets like sales which from what I know is hard to do on VWO if you don’t send them as a separate metric.

Also with SaaS tools, it’s harder to test something thats not in UI, as it require, as it requires calling their api on a backend to know which variation should be served. We have avoided that by combining this lib - https://github.com/Glassdoor/planout4j - with snowplow tracker. This minimizes the performance penalty on each call.

When your splitting traffic by something else than a user do you do that on the split logic or you manage to infer this somehow when analyzing in DW? Normally I guess you would need to define an experiment with different split rule.

Colm · May 15, 2019, 12:32pm

Really great write-up @evaldas, thanks for sharing!

robkingston · May 15, 2019, 1:19pm

Splitting traffic at anything other than the user-level really needs to be done at the point of assignment.

True, you can always assign at the user-level and analyse your data at the session level, but then you’ve added session counts as a confounding factor.

Thanks for the tip about Planout - looks awesome! Is this what you use for app testing or server-side testing?

evaldas · May 15, 2019, 4:40pm

Right I guess it depends on the test. The user splitting is so predominant for experimentation that I actually never even thought about alternatives.

No problem, yes I have been using it for running tests on server-side. One useful thing that it supports it’s own scripting language which can be used to update tests on runtime without code changes. I have done some updates to make it work with snowplow, but haven’t had the time to open source it as a module project, though to be frank its fairly easy to extend it anyway.

Topic		Replies	Views
Mojito on Snowplow: Open-source split testing framework Other libraries and apps	2	1762	July 27, 2019
How to conduct A/B testing in Snowplow For data modelers & consumers	12	3035	April 2, 2021
Best analytics UI to use with Snowplow? For data modelers & consumers	9	6844	January 9, 2020
Storing AB test variant selections as custom contexts - is there a better way For data modelers & consumers	5	3191	November 30, 2016
Visualising snowplow data	4	1359	December 7, 2019

Clickstream data datawarehousing guide

Related topics