Compute profiles of Scala Collector & Enricher

vivricanopy · November 22, 2016, 9:57pm

Hello fellow Snowplowers,

I’m setting up an ECS Cluster of Scala Collectors & Enrichers, and trying to optimize my resources. What did you find would be the optimal “generous” configuration for both the collector & enricher? Currently, I give 2 cpu cores and 512 mb each - but I wonder whether they benefit at all from the two cores. Are there any official recommendations regarding machine size?

Thank you,
Victor.

vivricanopy · November 28, 2016, 8:50pm

Is this a poor question or a trade secret? Has anyone done the work to optimize the cost/benefit from the underlying iron on the streaming components?

josh · November 29, 2016, 8:45am

Hey @vivricanopy on the Stream Enrich side (or any stream consumer) we tend to think about in terms of 1 shard needing 1 vCPU. As you scale up or down your Kinesis Stream you will then need add/remove vCPUs to remain optimal.

As for un-official recommendations for instance types:

For the collector anything with a moderate to high network performance to ensure your latency is low - we find that with any decent load on the lower end instances latency quickly climbs to the ~40-50 ms mark, in general we tend to use m3.xlarge for high load and m3.large/medium for lower loads.
Most of the same rules apply for the Enrichment - you are going to be reading and writing a lot of data to Kinesis. The thing to note here is that depending on how many enrichments you are going to be using you can drastically increase your compute demands.

In short there is no generic rule for what instances to use - every pipeline has different requirements so a lot of it will be figuring out what suits your load best and what latency you are happy to deal with in the pipeline!

Hope this helps,
Josh

vivricanopy · November 29, 2016, 4:48pm

Thank you @josh! This was immensely helpful. I’m trying to profile the ECS cluster for optimal usage. Currently throwing a c4.large at each instance - I think it’ll do then

Topic		Replies	Views
Snowplow Enricher - CPU utilization issue Enrichment	5	932	March 2, 2023
Scala Stream Collector - scaling Collectors	7	3520	January 25, 2017
Enricher high CPU utilisation issue Enrichment	33	4473	May 4, 2022
Kinesis Enricher CPU usage recovers slowly after peak Enrichment	5	1726	December 15, 2021
Scaling quickstart For engineers	6	790	October 17, 2022

Compute profiles of Scala Collector & Enricher

Related topics