Hey guys - it could be something silly i did on my end, but we have been having a lot of “hot sharding” problems lately.
I have had several people look at the stack, from our autoscaling group, thread pool size, config options, EC2 compute/network/ram usage etc, and still seem to be getting hot shards, which leads them and myself to believe maybe random uuid isnt randomizing across the shards evenly?
Unfortunately, its only really visible at scale, due to our current provisioning, but when we are under heavy load, we noticed about 10% of our shards that start falling further and further behind.
Has anyone else seen this or have any insight? I am going to set up lambda logging of partition keys to dynamo to do some basic analysis or look for patterned output.
Our collector to raw is running perfect with no visible hot sharding, including the loads that hot shard our good/bad streams.
Look forward to hearing back from you guys with suggestions of where to look. Thanks!