I wanted to ask if anyone has experience in implementing auto-scaling solutions for Kinesis shards using AWS real-time architecture. The implementation of scaling the shards and consumers is easy. The problem which I wanted to discuss is managing intermediary shards created after scaling the Kinesis stream.
Let’s look into an example:
A user has load balancer facing stream collectors that write records to a Kinesis data stream (8 shards) with data retention period of two days. Data is consumed from this stream by scala stream enrich application (2 nodes, 2 consumers in each node. Each consumer gets 2 shards).
Now, suppose the traffic goes up and we scale Kinesis stream to have 12 shards. Now the stream has 28 shards in total (8+8 closed and 12 active) and DynamoDB holds leases for these 28 shards until the data expires for 16 closed shards, after which, the intended 12 leases remain. Has anyone dealt with scala stream enrich consumer rebalancing between the given 28 shards (ideal situation would be that the 12 shards that are active would be spread between running consumers as evenly as possible (of course you can’t do that with 8 consumers in our example)).
The answer, that this is impossible to do would also help
Hi @Aurimas_Griciunas - so the good news is that under the hood we leverage the KCL which handles spreading the load across the nodes evenly (or should do so).
As you scale up you will create intermediary shards which are then processed by the consumers and then noted as ended (SHARD_END state) in the DynamoDB state table - the remaining 12 shards will be automatically balanced between the stream enrich nodes. If for whatever reason the workers do not share evenly an effective way to resolve this is to simply reboot the stream enrich process which forces them to re-allocate the shards between themselves.
This balancing of shards is also a good reason to maintain evenly dividable numbers of shards so as to ensure that they can be balanced evenly amongst your consumer group.
All the KCL applications we have published (Stream Enrich, S3 Loader and ES Loader) deal with resharding operations automatically!
The problem I faced with was that this automatic rebalance did not happen (several times), since (given the above hypothetic situation) only 6 distinct lease owner IDs (we have 8 consumers) were present in DynamoDB under active shards. This in itself is not a huge problem, but it would become one if autoscaling would be introduced where there might be numerous rescaling throughout a short period of time. I will try restarting the consumers and come back with results in a few days.