Add/Remove From Cart Data Issue

Hi, so I have the javascript tracker setup so when a visitor adds or removes something from their cart, I can see that an event is sent from the tracker to my collector. In my network tab, I see the query parameters like so: http://chrisl.in/s/007b35.png

I have a sneaking suspicion it has to do with the bad enrichment data files I see where I see something like this below:

"errors":[{"level":"error","message":"error: instance type (integer) does not match any allowed primitive type (allowed: ["string"])
    level: "error"
    schema: {"loadingURI":"#","pointer":"/properties/sku"}
    instance: {"pointer":"/sku"}
    domain: "validation"
    keyword: "type"
    found: "integer"
    expected: ["string"]
"}]

So I have two questions then

  1. for this data, is it possible to change my sku so it accepts integers as opposed to strings? Otherwise, I feel like if I reprocess my data, it’s all shown as integers so it would still not be processed? Or can i change it so it can accept both integers and strings?
  2. What’s the best way to rerun all my bad enrichments? I’m assuming it will reprocess the bad data with the new configurations and therefore populate the add to cart and remove from cart tables? For my emr etl runner, I have something like this running:

/home/ec2-user/snowplow-emr-etl-runner run -c /home/ec2-user/config/config.yml -r /home/ec2-user/config/iglu_resolver.json -t /home/ec2-user/config/targets -n /home/ec2-user/config/enrichments/ > /tmp/cron.log

Hi @clin407,

have a sneaking suspicion it has to do with the bad enrichment data files

So just to clarify/make it easier to find help with this kind of thing - what you’re looking at here is bad rows. The Enrich component of Snowplow does two things - validate data against its event definition schema, and enriches the data.

If data doesn’t validate (ie doesn’t appear as expected), it goes to bad rows for debugging. The idea here is that you get data collection right - in other words, you ensure high quality data by making sure it’s collected correctly at source. That eliminates the need to spend a long time working on data quality/investigating problems later on - the data is usable once it lands in-DB.

  1. for this data, is it possible to change my sku so it accepts integers as opposed to strings?

Using the standard add to cart tracking no, you can’t change it. You could set it up as a custom event, but I think we’re missing the actual issue if we go down that road. SKUs aren’t integers, they’re essentially names of items - which are strings. They can have letters in them and it doesn’t make sense to add, subtract, multiply skus. So the simple solution is to just send the data as strings.

I recommend ensuring that you’ve tested a tracking setup thoroughly before taking it live, bad rows are generally for debugging issues with live tracking, but if you spin up a Snowplow Mini instance and test your tracking setup before implementing tracking, you’ll avoid this kind of thing by and large.

In terms of handling current tracking issues in production - here’s a guide to the process of deugging bad rows in Athena for batch (older versions) and here’s one for real-time (newer versions).

It’s definitely worth following that guide to uncover any other validation issues you might have.

What’s the best way to rerun all my bad enrichments? I’m assuming it will reprocess the bad data with the new configurations and therefore populate the add to cart and remove from cart tables? For my emr etl runner, I have something like this running:

If you want to reprocess the data, you need to convert the type of these values to strings.

It might be a bit of a time investment, but there’s a guide here to handling it.

Snowplow event recovery could be used to handle this

Or can i change it so it can accept both integers and strings?

It’s not possible for one field to have two types. Think of a column in a database - you can’t have strings in an integer column or vice versa.

I hope this is helpful.

Best,

1 Like

This is helpful. So it looks like I have been sending in my SKU as an integer and not a string which is my fault. I guess my only recourse now to recover that data would be to go through the Hadoop event recovery link you sent?

There is basically no way for me to rerun all the raw and “update” all the records? I believe all the data should still be there since I’ve never deleted anything.

You should have a record of the RAW data if you’ve set up the pipeline the standard way, but iterating through raw data is generally a difficult task, especially for this type of thing - Raw data will be in a difficult to use format, and it includes everything that’s hit your collector.

The recovery process outlined in that guide is essentially to transform the corrupt value, create a new raw row for it, then reprocess that.

1 Like

Perfect, that’s basically what I want to do. Let me read the tutorial and see if it helps. Thanks!

Hi @clin407,

The info I gave you is actually out of date - we recently released Snowplow Event Recovery, and your example is actually one of the standard use cases covered. I don’t know why I didn’t think of this last week.

Aplogies if I led you down a less productive path but the good news is that this means of recovery should be more straightforward.

Best,

1 Like

One caveat to this - use Snowplow Event Recovery if you’re using the real time pipeline (Scala Stream Collector) otherwise if you’re on batch you’ll still need to use the existing Hadoop Event recovery job.

1 Like

Oh, I use the clojure collector. :frowning:

One thing I noticed when reading the Hadoop event recovery is there doesn’t seem to be an easy way to test my script to see if the code is valid or correcting the data accurately. Is there a way to do that?

It’s not particularly easy to test but if you want to test the Javascript recovery function in isolation you can try use one of these methods.

2 Likes