S3 Loader cannot update checkpoint error

mgloel · September 14, 2021, 11:17am

Hi,

we are receiving these error on the s3 loader module a 2-3 times per day:

Is this something to worry about? Will the data be processed and moved to s3 or do we have possible data loss there? How can we prevent that from happening. In our kinesis stream or s3 loader configuration?

josh · September 14, 2021, 12:30pm

Hey @mgloel,

Is this something to worry about? Will the data be processed and moved to s3 or do we have possible data loss there? How can we prevent that from happening. In our kinesis stream or s3 loader configuration?

This is generally due to scaling actions of your consumer group and shards being re-balanced across the available consumers. Is any scaling activity happening close to when this error happens?

In terms of data-loss no there should not be any - we only progress on a shard after a successful checkpoint. As such in this case you might have a chance of duplicates entering your bucket but there should not be any data-loss.

mgloel · September 14, 2021, 1:07pm

Great, thanks for the info.
Just one question regarding the porential duplicates. The will be removed by the shredder deduplication, right?

josh · September 14, 2021, 2:53pm

They should be yes - but ill see if @anton can add any extra details to deduplication.

anton · September 15, 2021, 3:24pm

Hi @mgloel,

The will be removed by the shredder deduplication, right?

Very likely - yes, but there’s a small chance of no as well. In short: if these duplicates get to the batch - they’re certainly removed. If not, e.g. if your S3DistCp started between two files with duplicates are flushed - they only will be deduplicated if you have cross-batch deduplication enabled.

You can find more details about deduplication here:

https://docs.snowplowanalytics.com/docs/pipeline-components-and-applications/loaders-storage-targets/snowplow-rdb-loader/event-deduplication/

(You’re interested in natural in/cross-batch deduplication, not synthetic)

Topic		Replies	Views
Deduping Events at collector /enricher level in stream Collectors	1	1102	August 19, 2019
Duplicates due to failure on Storage Loader? Storage targets	2	1279	January 20, 2017
RDB loader container fails when there's no new shredded data Storage targets	3	1145	July 22, 2021
Snowplow S3 loader 2.1.0 released New releases	0	976	December 2, 2021
S3 Loader Not Loading Data from Stream Storage targets	2	2038	October 15, 2018

S3 Loader cannot update checkpoint error

Related topics