I got a question around snowplow’s ability to capture all possible interaction events automatically.
Context:
I am currently using Snowplow Javascript Tracker with GTM container. My usual method to track new events is to setup a tag to fire in GTM with pre-defined trigger and setup snowplow tracker to track that particular events accordingly (can be a page view, button click, link click etc.)
Problem:
When tracking events in this way, I realize later down the road that I might need to answer new business questions with events data I didn’t track previouly. This requires me to define and setup tracker and wait for the data to be populated again. Not only that this is quite a manual process (esp when this platform is used by many people), it also incurs time delay since we would need to wait for data collection.
My Suggested Solution:
I am thinking that maybe we can just capture all possible user interaction events in the first place so that we can re-visit those raw events data retroactively. This may bloat up the volume of raw data but we can gain time advantage when we want to look at event data we didn’t know we need. The closest similar solution to this is Heap’s autocapture.
Question:
Can snowplow do this?
If we can, what are the suggested ways to do this? Should we modify the Javascript Tracker? Should we do some special setup in GTM?
You can use the automatic link-click-tracking feature to track a lot of the events that are happening on your page:
It’s not as comfortable as Heap though as you probably will realize it is sometimes difficult to exactly understand what the event belongs to and fix it in data modeling.
The link-click-tracking there is able to capture all elements which has URL link in them I think. But what about AJAX or React Button click? Basically an interactive element that does not direct user to a new page but provide changes to the current page. This can be a button click to display drop down list for example.
You could probably use a custom event that triggers on all clicks except clicks on the webpage background in GTM. You just have to think about a way to populate your event information (CSS classes, HTML Object, …). It will require a custom javascript to find meaningful descriptions. For example using the parent element, when there is no class on the click element and so on… Also often there are multiple dom elements that are part of a single clicked element. (li.menu > span.button > a.blue). Depending on where the user clicks exactly on a single button, you will have fields populated differently, which becomes a pain to dissect.
Do you have any information in your application so you could use data attributes in your dom that help with the naming of click elements?
So yes, technically this is something the Snowplow tracker is capable of but it isn’t something that we’ve built into the tracker natively. The ideas @volderette have shared are useful concepts on how to think about solving this, particularly around adding common classes to everything you want tracking and then hooking up event listeners to all those elements with that class (I’m sure there are plenty of Stack Overflow articles on how to achieve this). I would avoid modifying the tracker, it makes it harder to stay up to date with the latest features, but solving this via GTM triggers should definitely be possible.
However, that still not quite the entirely automatic idea you first mentioned. When it comes to automatically tracking everything, this isn’t really something I feel is a great idea. Being deliberate with your data collection choices is important to ensure you are working with high quality and consistent data downstream. Collecting everything makes the life of the engineer easy but makes the analysts job much harder. More data doesn’t mean better data.
However, by following the idea of adding css classes to the buttons and such that you wish to track clicks on is deliberate, as you have to add the css class. So I think thats a reasonable way of solving this - you don’t end up editing your tracker configuration all the time, you just have to ensure you add the right css classes to the button and that will automatically fire the custom event.
Going beyond that though, even if trying to prepare better for future use cases you don’t know you have yet, it’s unlikely that collecting everything will help you answer those questions. What you’ll end up collecting will be generic, catch all tracking (and may even be partly that way with the css classes suggestion too). It won’t be well designed and thought out tracking with custom entities or properties for different buttons and sections of your website. Ultimately spending the time to think about tracking design leads to a high quality data asset that is trustable and well poised to answer your questions.
I am not sure that this is possible to track everything. As even meaning of all possible user interaction events is different for me now vs me a year ago. For example, today I might want to know if person saw specific page element(s) while a year ago I was not even thinking about it (and it was not important to business).
In my personal experience, the best course of action is to spent time writing a measurement plan that closely aligns with goals of your business and implement tracking based on it. I found if you spent time on writing it out in advance you can quite often answer a lot (not all) of the questions you’ll get thrown at you down the road. Appetite for data and insights it can give in organizations change overtime so you cannot predict how it turns out, you can only plan for what makes most sense now time based on known business goals.
On a technical side, I prefer implementing additional tracking via self-describing events by writing additional logic in a JS wrapper or GTM. I usually ask developers to add data attributes if possible to key elements and then use Event Listeners to easily track interactions with those key items.
Personally, I would be opposed to adding more interaction tracking to Snowplow core as it bloats codebase and have potential of impacting both tracking performance and user experience. I think Snowplow is great but to me it is already bloated with features we do not use so we have to actually manually cut them out every time there is an update. Ideally the additional interaction tracking should be added but as optional plugins (which is by the look of it is something Snowplow guys are working on).
Off topic but always worth mentioning when I get the opportunity We’re working on improving this in v3, with a more “pluggable” architecture, allowing you to pick the bits you want. I’m on a mission to decrease that sp.js filesize!
Honestly you’d be better served defining what you want tracked and have a developer feed this into your tag management solution datalayer. If you go for a “catch all” scenario you’ll spend more time trying to figure out what you’re looking at in the long run. You will also run the risk of other people potentially looking at the data and getting frustrated due to lack of understanding and/or answers, which may turn into a broad stroke stance that the overall data is not dependable or nonsensical.
Thee best and underestimated exercise you can do before embarking is sit down and write out a tagging specification with input from stakeholders. This can then be used for developers and a reference guide providing it has excellent version control. In doing this everyone then can understand exactly what is being delivered and what is not. At the end of the day you’ll never track everything you want on day 1, its an iterative process that develops with business needs and digital creative.
I have some strong (and probably not correct) opinions about the track everything vs track based on an explicitly defined plan. I think it’s important to maintain a balance and assess what is trying to be achieved.
In previous roles I’ve found that the track everything philosophy leads to finding a needle in a haystack, with the caveat that you’ve now made the haystack larger - the needle is still the same size. You can collect a lot of user interactions but that doesn’t necessarily mean the data is well structured, or that it follows a sound methodology. It’s the equivalent of trawler fishing - you’re going to catch a lot of fish but don’t be surprised when you figure out there’s a bunch of sharks and stingrays that have managed to get in.
Not everyone is guilty of this - but certainly in some organisations collecting everything increases the tendency to perform post-hoc analysis. Sometimes this is a valid approach - if done correctly, but often it’s not. A well designed experiment (or collection methodology) sets this during the experimental design phase. Having additional data collected after the fact to try and justify (or invalidate) the findings of an analysis is much easier to accidentally do if you have the data points just sitting there waiting.
Sending data that you may or may not use has multiple costs. It doesn’t just cost the user in terms of user experience (longer JS compilation, lots of DOM listeners) but also raises the possibility of increased (or larger) network requests degrading the performance for a user. Bandwidth and disk space aren’t free - so there’s a very real environment (carbon) impact on generating, transmitting and processing data that may or may not be used.
In an environment that is becoming increasingly privacy focused (and rightly so) the idea of collecting additional data for the sake of it without having clear reasons for processing (as in GDPR, CCPA etc) doesn’t sit well with me. Ensuring that the end user consents, and has a clear understanding of what and how data is being used about them is critical to establishing trust.
As @kfitzpatrick has mentioned, being forced to write a plan is a fantastic exercise in identifying edge cases and supports an iterative rather than a catch all approach.
This is the solution that I am thinking at the moment. Some of the front-end elements have id and some only have classes. So I have to make plan to rename the corresponding click elements to an event retroactively. I am not the one who does the front-end code so I have to liaise with the front-end engineer for this.
To @PaulBoocock, @mbondarenko, @kfitzpatrick, @mike, I really appreciate inputs and opinions given on the catch-all strategy given I am quite new to this digital analytics too. I am writing up a plan to collect the data however I do hope to catch all possible interesting interactions and let the users re-name the captured interaction accordingly. Now with this input I probably have to narrow down the scope of data collection process and do a semi-catch-all strategy ie. catch all interactions that might be interesting (it may be clicks on the buttons mostly). This way I avoid casting a huge net that might catch things I may not like.
The idea of using GTM and playing around with the trigger to accomplish this seems feasible enough. I would want to avoid meddling with snowplow.js code too. I was just asking that if that’s a possible road I have to go through to achieve my previous intent of capturing everything.