Scaling Kafka enricher

bambachas79 · March 11, 2020, 9:16am

Hello everyone,

I am running a snowplow pipeline having a Kafka enricher in a local instance performing one API enrichment and one sql enrichment. Is there a way to scale up one single enricher instead of running the process multiple times? I am trying to figure out how the enricher uses threads and if this option configurable.

Thank you!

BenB · March 11, 2020, 9:44am

Hi @bambachas79,

You can allocate more CPUs (= more threads) and more memory to your enricher, but bear in mind that with Kafka there can be only one thread consuming the partition of a topic, so even if you allocate 10 CPUs to your enricher but you have only 2 partitions, only 2 threads will be consuming from Kafka. So scaling doesn’t depend only on the resources allocated to the enricher but also on the number of partitions for your topic.

Please do not hesitate if you need more help.

bambachas79 · March 11, 2020, 9:48am

Thanks for your quick response @BenB!

In Kafka i am running 100 partitions, how can i alocate more CPUs to my enricher?

BenB · March 11, 2020, 10:07am

Having more than 100 CPUs on one machine seems like a lot and makes us lose the fault-tolerancy that we gain with Kafka and several instances consuming. Is there any particular reason why you want to run the enricher on only one machine instead of several?

How do you start your enricher?

bambachas79 · March 11, 2020, 10:21am

i am running the command

`java -jar snowplow-stream-enrich-kafka-1.0.0.jar --config kafka_enrich.conf --resolver file:resolver.json --enrichments file:custom_enrichments`

To run the enricher with 10 cores do i have to run the command 10 times? I want to run 10 enrichers in 10 machines.

BenB · March 11, 2020, 10:59am

When you use java command, your JAVA application (stream-enrich) will automatically use all the CPU cores that are available on the machine to run its threads. So if your machine has 10 cores and doesn’t do much else than running your app, the 10 cores will automatically be used.

So you just need to run java once on a machine that has 10 cores.

You need to run the java command once per machine.

bambachas79 · March 11, 2020, 12:07pm

And in case of 1 Core? When i ran twice the java command in a single core instance, i got the double messages in comparison with a single java command. Why is that? I also gave the maximum heap size to the JVM. Neither cpu and ram were high

Topic		Replies	Views
Scaling Snowplow Kafka Enricher Enrichment	3	1131	June 12, 2022
Enricher CPU utilization Enrichment	5	141	June 18, 2024
Enricher high CPU utilisation issue Enrichment	33	4475	May 4, 2022
Snowplow Enricher - CPU utilization issue Enrichment	5	936	March 2, 2023
Kinesis Enricher CPU usage recovers slowly after peak Enrichment	5	1726	December 15, 2021

Scaling Kafka enricher

Related topics