Hi,
I’ve just started testing out the event recovery. Got it up and running, and it appears to be recovering the bad rows, which is great.
But what’s strange is that I’m also seeing it pick up good rows that were previously processed - any idea what could be causing this?
It seems like we have a part-id.txt file in our ‘bad’ rows that might also include good rows as well as bad in the line, and the errors it encountered at the end of the file. We don’t store each event as a separate item in S3.
Does this seem like an error in the recovery process not being able to filter out the errors, or is the way we’re storing events the issue?
@irufus, if the events are sent with POST, the payload contains a few events (its number depending on the tracker configuration). If any of the events from the payload fails validation it results in a bad event that contains all the events in the payload including those that do pass validation.
The side effect of this is indeed good events in the recovered payload. This means that when you reprocess the recovered data you might end up with the duplicated events. There’s no work around this, unfortunately. If duplicates are an issue, you might also need to run some sort of deduplication on your data post recovery.