I am reading the store data and i am sending the events with different event id, different event fingerprint and etl timestamp could be same. Would it be a deduplicate event?
@ScalaEnthu, there is nothing wrong with having the same ETL timestamp - it simply indicates the events were processed in the pipeline at the same time (same batch). Different event IDs and payloads (event fingerprints) mean different events.
When talking about duplicates, it is important to distinguish between what we call natural and synthetic duplicates. They have different causes.
Natural duplicates are most frequently a byproduct of the tracker re-sending events when it has failed to receive confirmation that they have reached the collector. This is done to minimise that risk of data loss. The result could be events with the same event_id and the same payload (event_fingerprint) but with different collector_tstamp.
A similar result could take place in the real-time pipeline itself dues to at-least-once processing semantics. Again, the “at-least-once” processing is deployed to eliminate data loss.
Synthetic duplicates are events that have the same event_id but different payloads. In other words, these are not duplicate events (the payload - event_fingerprint - is different), but rather collisions in the UUID for the event_id field.
Thus, in summary, the following are the reasons for duplicated events
Client-side environment causes events to be sent in with the same ID
Events are sometimes duplicated within the Snowplow pipeline itself