Possible bug in random UUID for scala stream enrich?

13scoobie · April 5, 2017, 9:26pm

Hey guys - it could be something silly i did on my end, but we have been having a lot of “hot sharding” problems lately.

I have had several people look at the stack, from our autoscaling group, thread pool size, config options, EC2 compute/network/ram usage etc, and still seem to be getting hot shards, which leads them and myself to believe maybe random uuid isnt randomizing across the shards evenly?

Unfortunately, its only really visible at scale, due to our current provisioning, but when we are under heavy load, we noticed about 10% of our shards that start falling further and further behind.

Has anyone else seen this or have any insight? I am going to set up lambda logging of partition keys to dynamo to do some basic analysis or look for patterned output.

Our collector to raw is running perfect with no visible hot sharding, including the loads that hot shard our good/bad streams.

Look forward to hearing back from you guys with suggestions of where to look. Thanks!

13scoobie · April 5, 2017, 9:44pm

Also, if it helps/is related:
we run ip_lookups using maxmind, event_fingerprint and user-agent-utils (3 total enrichments)

the bad stream had content that was failing schema validation by passing a null to a required field.

It is typically our bad bucket that starts throttling provisioned write throughput which is what drags our enriched stream behind (since the stream enrich app processes both good bad, when its flooded with bad, good also pays price)

I tried scaling # of shards, no avail
i added more stream-enrich boxes through our ASG, did not seem to help (still falling behind)

i tried splitting hot shards, helped somewhat but did not fix root problem

Topic		Replies	Views
Resharding Kinesis and the Enricher AWS real-time pipeline	3	2109	September 23, 2016
Stream Enricher Error: Caught Exception	4	851	October 19, 2020
Making the Stream Enricher Highly Available (autoscaling group) Enrichment	12	3296	November 10, 2016
Scala Stream Collector - scaling Collectors	7	3335	January 25, 2017
Speeding up Stream Enricher Enrichment	12	3222	August 3, 2018

Possible bug in random UUID for scala stream enrich?

Related Topics