Detecting abandoned shopping carts with Snowplow

what would be the recommended approach to track shopping cart abandonment in javascript tracker?


Hi Magaton, can you provide a bit more information? is it for you own shop? what platform are you using for the shop?


Hello, it is for client’s e-shops. We are now evaluating options for the clickstream part of our reco engine product.
e-shops are very different: magento, hybris, woo commerce, ofbiz, so we are looking for a very general and simple way to do that.

I’ve come across this series of blog posts: which opened me a completely new perspective.

Having page pings inserted as event nodes for checkout page seems like a straightforward way to answer the question from the subject.

Before I do a POC with snowplow, kafka and neo4j can somebody tell me why this approach is not a mainstream and if this is something that makes sense to snowplow dev community?

Hey @magaton - using a graph database for abandoned shopping cart detection is a really interesting idea.

However, given the fairly simple rules around defining and detecting abandoned carts, you can probably get away with something simpler. My Event Streams in Action book has an example abandoned shopping cart detector written for Kafka using Samza. It doesn’t process Snowplow events but you could certainly adapt it to work with Snowplow. The code is here:

Thanks @alex, I think I understand. Hope you by “interesting” don’t mean crazy idea :slight_smile:
I usually think in cypher query terms and once data is in graph, everything is easy since you can ask anything, but to get there and keep it to a reasonable size is a different topic.
Do you see any particular reason to use Samza instead of Kafka Streams?
We have kafka, neo4j and ES in our infrastructure, which is already hard to manage, so not really keen to add a new beast :slight_smile:

Hi @magaton - no don’t worry, by “interesting” I just meant interesting :eyeglasses:, not crazy :scream:

This is the classic analytics-on-write versus analytics-on-read debate:


  • Do all the pathing in a relatively general way in Neo4j
  • Ask any question you like
  • Experiment with different abandoned cart definitions
  • Some challenges around latency and scalability


  • Decide on an abandoned cart definition
  • Write a Samza or Kafka Streams job to run the algorithm in-stream (an AWS user would use Lambda + DynamoDB)
  • Much less flexible than analytics-on-read
  • But low latency and super-scalable

@magaton - would you be willing to discuss this by email? I think we have similar needs.
alexmc6 on github or alex.mclintock at gmail dot com


Have you got it working?Do you mind sharing?