Resharding Kinesis and the Enricher

vivricanopy · September 22, 2016, 8:39pm

Hello,

We’re looking to optimize our costs and to re-shard the data-sink Kinesis queue (post-enricher) based on the realtime load. I know the enricher stores high-watermarks in DynamoDB - which is a complication for zero-downtime (or short downtime) resharding. Is this even possible?

We’re thinking of doing a variant of green/blue deploys whereby we’ll spin up a parallel enricher + kinesis queue + consumer lambda with the revised amount of shards, do a switch either in Elastic IP or ELB, let the old one drain out and take it out of commission. I was just wondering whether there’s a less heavy-handed approach. Let’s investigate this together!

Victor

alex · September 22, 2016, 11:23pm

Hi @vivricanopy - it’s totally possible to do dynamic re-sharding (splits and merges) of each stream in situ.

There’s no need to spin up a parallel pipeline unless you need to do a breaking change of the actual contents of a stream.

I presented on the autoscaling/monitoring tech, called Tupilak, which we use to achieve this for our Managed Service Real-Time customers at Snowplow London Meetup #3 last night, here are the slides, I hope they’re helpful:

Introducing Tupilak, Snowplow's unified log fabric from Alexander Dean

grzegorzewald · September 23, 2016, 7:03am

Hi @vivricanopy,

I have been playing for a while with both, Amazon KCL library (the library reals tack is based) and Snowplow kinesis stack and Kinesis stream resharding. I have not seen any issues. The library workers were able to pick up new streams, eat form old with data and so on. Everything works automagically. No need to do anything:-)

Hint: in dynamo you may find shard iterator and shard id so workers pick up what is available.

vivricanopy · September 23, 2016, 2:00pm

Thanks @alex and @grzegorzewald!

Topic		Replies	Views
Stream Enrich in Kubernetes cluster AWS real-time pipeline	4	1867	April 12, 2019
Why is Snowplow using Kinesis/Kafka for real-time pipeline? AWS real-time pipeline	4	6038	July 12, 2016
Scala Stream Collector - scaling Collectors	7	3520	January 25, 2017
Autoscaling Kinesis in AWS stream architecture AWS real-time pipeline	2	1858	May 7, 2020
Shredding to Redshift in the Scala Collector Flow AWS batch pipeline (Legacy)	2	2118	September 24, 2017

Resharding Kinesis and the Enricher

Related topics