Unable to scale stream shredder horizontally

Hi there! I have an issue scaling the stream shredder horizontally. Currently, we are running the Snowplow pipeline on AWS using Kinesis and stream shredder 2.2.0, which is consuming from a Kinesis data stream with 6 shards. The pipeline is working as expected, but lately, we noticed that the RecordProcessor.processRecords.Time CloudWatch metric has started increasing, indicating slow processing and delays in delivering the events in real-time to Redshift. I’m running the shredder on an EC2 instance as a Docker container. Additionally, the shredder is set up as an enhanced fan-out consumer.

I decided to scale it horizontally. My first attempt was to run a second container with the same appName but a different S3 output path. In the first two minutes, everything was fine. Both shredders started consuming 3 shards each, which was what I expected, but suddenly the Kinesis SDK started closing the connection to one of the consumers with the following error:

WARN software.amazon.awssdk.http.nio.netty.internal.http2.Http2GoAwayEventListener - GOAWAY received on a connection ([id: 0x882e9690, L:/XX.XX.X.XX:XXXX - R:kinesis.us-east-1.amazonaws.com/XX.XX.XX.XXX:XXX]) not associated with any multiplexed channel pool.

I thought that this was due to the fact that I’m running 2 shredders with the same appName.

My second attempt was to run a second shredder with a different appName. This resulted in a new enhanced fan-out consumer being registered in the Kinesis stream, a second DynamoDB table being created, and assignment to all 6 shards. Consequently, we started having duplicate events and the same error again:

WARN software.amazon.awssdk.http.nio.netty.internal.http2.Http2GoAwayEventListener - GOAWAY received on a connection ([id: 0x882e9690, L:/XX.XX.X.XX:XXXX - R:kinesis.us-east-1.amazonaws.com/XX.XX.XX.XXX:XXX]) not associated with any multiplexed channel pool.

Am I missing something? I will greatly appreciate your help.
Thanks in advance!

Hi @peter.petrov ,

We generally recommend to scale the streaming shredder vertically because auto-scaling horizontally can introduce a bunch of duplicates, due to how KCL (the library used to consume from Kinesis) works. But in your case if you have a fixed number of shards and a fixed number of instances you could be fine.

All the instances must share the same appName. This is how KCL is able to share the consumption between instances.

but suddenly the Kinesis SDK started closing the connection to one of the consumers with the following error

I’m not familiar with this error message, but this seems to be just a warning. Did you observe a problem with the consumption after this log ?

Hi @BenB
I managed to fix the issue as I decreased the number of shards form 6 to 4.
It turns out that in a configuration of 6 shards and 2 shredders, KLC was not able to distribute the consumption equally between the 2 shredders resulting in sporadic SubscribeToShard.RateExceeded errors and the warning I mentioned in the original post. Decreasing the number of the shards solve my problem and allow me to scale the shredder horizontally.
And yes - we observed data lost after this warning in the log.