Kibana and Redshift mismatch

Hi @Hasan_Shaukat , lack of event_fingerprint can indeed mess up with the deduplication.

You can read up on why and how duplicates are created in detail here. To give you some context in short though, when talking about duplicates, it is important to distinguish between what we call natural and synthetic duplicates. The significance is that each group is dealt with in a different way.

Natural duplicates are most frequently a byproduct of the tracker re-sending events when it has failed to receive confirmation that they have reached the collector. This is done to minimise that risk of data loss. The result could be events with the same event_id and the same payload but with different collector_tstamp . The solution is to identify these duplicates and only keep one of them.

Synthetic duplicates are events that have the same event_id but different payloads. In other words, these are not duplicate events (the payload is different), but rather collisions in the UUID for the event_id field. The solution for these duplicates is usually to assign them a new, unique event_id .

You can have both types of duplicates in the same batch of events (ie the same EMR run) or across batches. Also, the mechanisms that create duplicates can work in tandem, so that a user can end up having both versions for the same event. For example, if a bot sends 50,000 events with the same event_id (synthetic duplicates), a portion of them might get sent more than once, creating natural duplicates.

You’ll notice I’ve been talking about ‘the same payload’ or ‘different payloads’. The way to compare them is via the event_fingerprint, which is what makes it significant. Without it, deduplication is not reliable, and can result in what you are describing.

The events that seem to be duplicates but have different event_id are likely synthetic duplicates that got new event IDs. The ones that seem to have different timestamps are likely natural duplicates where only one was picked. And some might have been both synthetic and natural.

1 Like