Snowplow vs Google Analytics

In my most recent post on the Snowplow blog, I discussed the importance of data ownership, specifically some of the limitations companies can face with utilizing third-party analytics platforms. Naturally, Google Analytics entered the conversation.

Recently on Twitter, Al Wightman raised very valid points in a series of Tweets about the blog post. Al said:

Agree with virtually everything in your latest article. However ironically to use your phrase, the article is “taking certain liberties” with the way GA is described. Happy to admit GA is not perfect but every analytics tool, not just GA, has to take decisions on the dirty bucket that is Direct. Granular hit-level data approaches like Snowplow, which I am a fan of, certainly give the user much more control. But attribution and data schema decisions still need to be made at some point even in a granular hit-level tool/database. Equally GA certainly has sampling applied for certain custom reports. However all standard GA reports & some custom reports have no sampling whatever the data size or reporting period. Understand the desire to present a unique proposition, doesn’t mean we shouldn’t be more accurate in how we talk about GA/data tools

Our primary objective at Snowplow when we post blog articles like these is to help educate our audience, but coming in close second is the desire to spark conversation within the community. Conversations exactly like the one Al began when he reached out to us. Because Al’s response to my post was so thoughtful, I wanted to respond in kind.

I want to be clear that we’re big fans of Google Analytics; the guys at Google have created a great platform that’s democratized web analytics in a way that likely could not have been achieved by another tool. Seeing as they serve something like 94% of the web, it’s no surprise that they’re sampling data based on the extremely high volume of websites on the Google Analytics platform. Given that, it was my understanding that all reports were subject to sampling with the exception of premium users who specifically request non-sampled data. So, Al, thank you for bringing my misconception to my attention- I rely on conversations with experts in the analytics community, like yourself, to continue to learn. If you have any more input on which reports are sampled vs. non-sampled, I would love to discuss that further.

Al also made an excellent point about attribution and data schema decisions. My post didn’t intend to imply that at Snowplow we don’t support attribution models or using data schema, in fact, it’s quite the opposite. We believe that attribution and schema decisions are incredibly important, but where we differ from Google is that we believe they should be owned by the company who owns the data, not the analytic platform.

For example, if you take an online bank using Snowplow, their schema will include new accounts created, checking versus savings accounts, or opened credit cards (to name a few examples). Compare this with a dating site, say, where their schema are focused on number of matches, likelihood of falling in love, and interests and hobbies. We have our users make decisions on setting up their data schema early in the process; we believe companies should have schema that specifically reflect their data and their business.

Regarding attribution models, we’ve found that our users who take the most intelligent approach to using our data, and in return get the most value from it, have multiple attribution models. They will look at their data from multiple perspectives and cross-reference the output from several models as a means to validate their overall insights. If only one model claims that Facebook is the most valuable source of new leads, there’s reason to be skeptical, but if all of their attribution models show that Facebook is not generating a significant amount of new traffic, than it’s more likely that the channel needs to be reevaluated.

My intention here is to clarify Snowplow’s position on attribution modeling and data schema, as well as address where Al pointed out my misconceptions around Google Analytics. This has opened up a rich topic for conversation, so I hope that we can all learn something and continue to discuss this further.