Event Recovery Reprocesses Good Events

irufus · February 20, 2019, 10:47pm

Hi,
I’ve just started testing out the event recovery. Got it up and running, and it appears to be recovering the bad rows, which is great.
But what’s strange is that I’m also seeing it pick up good rows that were previously processed - any idea what could be causing this?

It seems like we have a part-id.txt file in our ‘bad’ rows that might also include good rows as well as bad in the line, and the errors it encountered at the end of the file. We don’t store each event as a separate item in S3.
Does this seem like an error in the recovery process not being able to filter out the errors, or is the way we’re storing events the issue?

ihor · February 20, 2019, 11:47pm

@irufus, if the events are sent with POST, the payload contains a few events (its number depending on the tracker configuration). If any of the events from the payload fails validation it results in a bad event that contains all the events in the payload including those that do pass validation.

The side effect of this is indeed good events in the recovered payload. This means that when you reprocess the recovered data you might end up with the duplicated events. There’s no work around this, unfortunately. If duplicates are an issue, you might also need to run some sort of deduplication on your data post recovery.

Topic		Replies	Views
POSTed bad events, are they all dropped? For engineers	5	1435	September 19, 2017
Snowplow bad events reprocessing	7	1436	February 23, 2021
Bad event clubbed together with good events in bad s3 bucket Troubleshooting	1	1870	September 21, 2017
What happens when one event in a POST payload fails validation? Enrichment	1	1231	April 10, 2017
Snowplow bad rows and POST requests	2	1146	March 13, 2019

Event Recovery Reprocesses Good Events

Related topics