Hello! I’m tracking e-commerce transactions with the script implemented on ‘thank you’ page, displayed after the purchase is made. On the same page, there’s Google Analytics script running.
Consistently, weekly sum of transactions reported by Snowplow is noticeably (even to ~1/3) lower than the number reported by Google Analytics. How could this difference be explained? Does the order of scripts matter? Any help would be deeply apprieciated.
From experience, it’s likely to be one of two things - bad rows or tracking configuration.
I’d start with bad rows - where data doesn’t conform to the schema, it’ll land in the bad bucket rather than going through to the database. Here’s a post that should help with checking this. If there’s something being sent as the wrong type - like an SKU sent as an integer, then this will happen.
If it’s not bad rows you should take a closer look at how the tracking is configured. The fact that some but not all the data is present is interesting - when the script fires is relevant insofar as it’s possible for the user to navigate away from the page before it does so. If you’re using a tag manager then I’d look into the triggers, and if it’s code directly on page I’d pay close attention to where on the page the script is fired.
I do see some traffic on “enrichment-bad” Kinesis stream. How can I access this one? From what I understand, it contains elements that doesn’t fit in ES. How can I see the output of this stream?
From what I understand, it contains elements that doesn’t fit in ES.
I’m not quite sure what you mean by this - if you set up an Elasticsearch index for the bad data you should be able to load it there. Could you be more specific?
Snowplow Mini is a small scale sandboxed Snowplow pipeline. It doesn’t handle volume, but it’s quick and easy to set up and gives you real-time feedback on your tracking setup.
I recommend spinning an instance up and testing your tracking there before taking it live to production - that saves you a lot of hassle.
It’s still a good idea to set up the bad rows sink too.
Hi, I think problem is deeper than events being marked as bad data. When looking at the collector, we see that only 0,05% events ends in “enrichment bad”. On the other hand we have problem with tracking transactions, where difference between Google Analytics and Snowplow is 10-15%. We’ve also seen difference in page views - for some channels even 50%.
JavaScript tracking script is embedded with google tag manager. The trigger is set to page view.
Question is: how to check what causes difference in the data?
In that case it’s almost certainly something to do with how you’ve set up tracking.
Unfortunately this is the kind of thing where it’s quite hard to help without taking a look directly. However I would consider the possibility that pages are being closed before your script fires.
Compare your GA triggers to Snowplow (eg. is Snowplow firing after DOM load, or at some different point to GA), and compare page view events for the previous and current page.
I would also consider other means of tracking transactions. In general the next landing page isn’t as reliable as setting up actual transaction tracking - lots of users will just close the browser after making a purchase.
Thanks Colm for your prompt answer. I also thought that the possibility that pages are being closed before tracking script fires is the case. However, one thing might contradict this. Namely, on the “thank you page” GA points almost exactly the same number of transactions as page views for this particular URL, so users stay long enough for GA scripts to fire. At the very beginning, we’ve triggered a transaction script on “All pages” trigger, but over the weekend I’ve changed it to “some DOM elements are ready”. I have the impression that it’s much better - waiting for more data to confirm this.
In addition, I’m made an experiment where I’ve placed two Snowplow tags with script tracking page views. The only change was APP ID. Over one day one script reported 7k page views while other almost 60k. How’s that possible?
If all you’ve changed is the app_id and you’re seeing that level of discrepancy than something is wrong with the implementation. As @Colm has mentioned above - there’s a number of possible issues this could be but without knowing and looking at the implementation directly (how GTM is setup, how GA is setup, how Snowplow is setup) it’s almost impossible to identify the exact reason for the discrepancy.
@mike@Colm I can provide exact configuration of snowplow on AWS as I’ve made it with terraform or maybe you have an hour for consultancy when I can walk you through configuration?
Setting up on your own and managing isn’t easy & is probably going to always involve a lot of debugging and confusing issue. That’s why we offer a paid service to handle all that for you, in case you’re interested here are the details: https://snowplowanalytics.com/products/snowplow-insights/ .
Otherwise, I would go through the docs and try to unearth what might look fishy, and pay very close attention to the tracking implementation. If you have specific questions along the way feel free to open a thread and ask.
@Colm I know that you offer it. For now I need consultancy.
I don’t think our problem is with collector - we’ve run stress tests (1k req / s for 10 minutes) and over 99,98% of events ended up in ES. The problem is rather with tracking script itself or how Google Tag Manager fire it. Also I’m compering on one view number of hits wit JS tracker and tracking pixel.