I am running a snowplow pipeline having a Kafka enricher in a local instance performing one API enrichment and one sql enrichment. Is there a way to scale up one single enricher instead of running the process multiple times? I am trying to figure out how the enricher uses threads and if this option configurable.
You can allocate more CPUs (= more threads) and more memory to your enricher, but bear in mind that with Kafka there can be only one thread consuming the partition of a topic, so even if you allocate 10 CPUs to your enricher but you have only 2 partitions, only 2 threads will be consuming from Kafka. So scaling doesnโt depend only on the resources allocated to the enricher but also on the number of partitions for your topic.
Having more than 100 CPUs on one machine seems like a lot and makes us lose the fault-tolerancy that we gain with Kafka and several instances consuming. Is there any particular reason why you want to run the enricher on only one machine instead of several?
When you use java command, your JAVA application (stream-enrich) will automatically use all the CPU cores that are available on the machine to run its threads. So if your machine has 10 cores and doesnโt do much else than running your app, the 10 cores will automatically be used.
So you just need to run java once on a machine that has 10 cores.
You need to run the java command once per machine.
And in case of 1 Core? When i ran twice the java command in a single core instance, i got the double messages in comparison with a single java command. Why is that? I also gave the maximum heap size to the JVM. Neither cpu and ram were high