Updating Enrich Kinesis and s3-loader in high load

Dmitry_Medkov · September 20, 2022, 10:07am

Hello

How can I stop a container or service without losing the events that are in it at the time of the stop.
I am using Enrich Kinesis version 3.3.1 and s3-loader version 2.2.2
I would like to know how I can apply new versions of services or containers.
If I stop the container or process through kill -15, will the service release memory at the same time, or what was in memory will be lost
Is there a way to stop containers under conditions of high and continuous load (~ 15000 RPS) and not be afraid that the data inside the service at the time of stop will be lost?

BenB · September 20, 2022, 11:18am

Hi @Dmitry_Medkov ,

Welcome to Snowplow community !

enrich-kinesis and s3-loader use at-least-once delivery semantics, so they don’t lose events in case of crashing or stopping. This is made possible thanks to checkpointing after events have been processed.

When enrich-kinesis gets stopped, it finishes to process the events that are in-memory and checkpoints before stopping, so there won’t be duplicates after restarting the app.

s3-loader is based on Amazon Kinesis Connector Library and looking at its source code, I’m not sure that such logic exists, so there might be some duplicates if s3-loader gets stopped in the middle of processing. But no data will get lost.

Dmitry_Medkov · September 20, 2022, 2:19pm

@BenB
Thanks for the answer.

enrich-kinesis is absolutely stable, if you stop the docker container or service via SIGTERM, then it will finish saturating the events that have already entered it, transfer them further to the stream, and after it is turned on again, it will continue from the same place where it stopped.
s3-loader will definitely not lose data when stopping the docker container or sending a SIGTERM to the service, but it can reprocess it and create duplicates
Did I understand you correctly?

BenB · September 20, 2022, 2:41pm

Yes that’s correct @Dmitry_Medkov !

Topic		Replies	Views
[IMPORTANT ALERT] R101 bug may result in duplicated data in the real-time pipeline Open Source Alerts	1	1875	May 26, 2018
Deduping Events at collector /enricher level in stream Collectors	1	1102	August 19, 2019
Strategies for handling 10,000 requests per second(and the S3 Loader)	0	927	November 19, 2020
How can we Reload the missing data in to redshift AWS real-time pipeline	0	330	February 20, 2024
Stream Enrich: Duplicated enriched events in R103 (#3745) AWS real-time pipeline	1	1926	April 30, 2018

Updating Enrich Kinesis and s3-loader in high load

Related topics