Unable to scale stream shredder horizontally

peter.petrov · July 24, 2024, 2:43pm

Hi there! I have an issue scaling the stream shredder horizontally. Currently, we are running the Snowplow pipeline on AWS using Kinesis and stream shredder 2.2.0, which is consuming from a Kinesis data stream with 6 shards. The pipeline is working as expected, but lately, we noticed that the RecordProcessor.processRecords.Time CloudWatch metric has started increasing, indicating slow processing and delays in delivering the events in real-time to Redshift. I’m running the shredder on an EC2 instance as a Docker container. Additionally, the shredder is set up as an enhanced fan-out consumer.

I decided to scale it horizontally. My first attempt was to run a second container with the same appName but a different S3 output path. In the first two minutes, everything was fine. Both shredders started consuming 3 shards each, which was what I expected, but suddenly the Kinesis SDK started closing the connection to one of the consumers with the following error:

WARN software.amazon.awssdk.http.nio.netty.internal.http2.Http2GoAwayEventListener - GOAWAY received on a connection ([id: 0x882e9690, L:/XX.XX.X.XX:XXXX - R:kinesis.us-east-1.amazonaws.com/XX.XX.XX.XXX:XXX]) not associated with any multiplexed channel pool.

I thought that this was due to the fact that I’m running 2 shredders with the same appName.

My second attempt was to run a second shredder with a different appName. This resulted in a new enhanced fan-out consumer being registered in the Kinesis stream, a second DynamoDB table being created, and assignment to all 6 shards. Consequently, we started having duplicate events and the same error again:

WARN software.amazon.awssdk.http.nio.netty.internal.http2.Http2GoAwayEventListener - GOAWAY received on a connection ([id: 0x882e9690, L:/XX.XX.X.XX:XXXX - R:kinesis.us-east-1.amazonaws.com/XX.XX.XX.XXX:XXX]) not associated with any multiplexed channel pool.

Am I missing something? I will greatly appreciate your help.
Thanks in advance!

BenB · September 24, 2024, 11:52am

Hi @peter.petrov ,

We generally recommend to scale the streaming shredder vertically because auto-scaling horizontally can introduce a bunch of duplicates, due to how KCL (the library used to consume from Kinesis) works. But in your case if you have a fixed number of shards and a fixed number of instances you could be fine.

All the instances must share the same appName. This is how KCL is able to share the consumption between instances.

but suddenly the Kinesis SDK started closing the connection to one of the consumers with the following error

I’m not familiar with this error message, but this seems to be just a warning. Did you observe a problem with the consumption after this log ?

peter.petrov · September 26, 2024, 9:48am

Hi @BenB
I managed to fix the issue as I decreased the number of shards form 6 to 4.
It turns out that in a configuration of 6 shards and 2 shredders, KLC was not able to distribute the consumption equally between the 2 shredders resulting in sporadic SubscribeToShard.RateExceeded errors and the warning I mentioned in the original post. Decreasing the number of the shards solve my problem and allow me to scale the shredder horizontally.
And yes - we observed data lost after this warning in the log.

Topic		Replies	Views
Scala Stream Collector and Kinesis Shards AWS real-time pipeline	1	1800	September 13, 2017
RT pipeline - Kinesis stream read throughput exceeded AWS real-time pipeline	2	2270	April 11, 2019
Stream Transformer Failing to Fetch Records from Kinesis Stream	4	864	August 19, 2022
Stream Collector 2.0.0 released New releases	0	1437	September 15, 2020
Kinesis S3 Sink not reading stream For engineers	5	1118	September 7, 2017

Unable to scale stream shredder horizontally

Related topics