RT pipeline - Kinesis stream read throughput exceeded

Hi,

I’ve setup an events pipeline, something similar to the lambda architecture here: Is my version of snowplow lambda architecture correct

I have a scala stream collector, writing into kinesis stream. The stream has 20 shards.
This stream has 2 consumers - stream enrich and kinesis firehose.

When I run load test with about 700 request per second, I get provisioned read throughput exceeded alert from AWS and I feel that it shouldn’t happen with 20 shards.
20 shards mean that the consumers can consume up to 40MiB per second, in total.
I really don’t think that I reach this, and I don’t get write throughput exceeded alert while the collector is allowed to write in a rate of 1MiB per second (half of the allowed the read rate).

Another read limit is 5 requests per second per shard, so I suspect that the consumers try to read in a higher rate.

Another interesting thing is that the scala stream enrich write to a kinesis stream with two consumers as well - elasticsearch loader and s3 loader. This stream has only 10 shards. But here I don’t get any alert on the read operations.

Did anyone run into this issue or have any idea what could be the cause?

BTW, I checked with AWS, kinesis firehose doesn’t support enhanced fanout at the moment…

Thanks.

What do your Get Records and Read Throughput Exceeded Cloudwatch metrics look like?

20 shards is quite high for a 700 requests/second on the write side so it sounds like you’re probably jut hitting read limits from Firehose.

1 Like

Thanks @mike,

This is probably the case. I talked to AWS support and it seems that I reached 6 get records per second with those two consumers.

I think that more shards is the only solution at the moment, until kinesis firehose and/or stream enrich will support the enhanced fanout feature.