I was wondering if anybody had the chance setting up multiple stream enrich containers that work against the same kinesis stream?
I want to see if I can improve the performance of the pipeline.
I just want to understand how the shard assignment is managed (if at all), or if I need to handle it myself (and if so - how).
I couldn’t find anything in the documentation about that.
if you run stream enrich as separate pod just use scale command to increase the pod count. From what I understand it uses kafka consumer group to synchronize messages in between each container (enrich kafka consumer), which avoids duplication.
I’m running the Snowplow pipeline on AWS ECS and stream enrich runs on multiple containers.
As far as I know, stream enrich uses KCL (Kinesis Client Library). This library handles the shard assignment (and re-assignment on scaling) for you. Here are some references: