AWS minimum Config for Setting Scala collector

sachinsingh10 · August 15, 2016, 6:13am

Hi All,
-RE Scala Stream Collector
What are the recommendations for minimum node configuration on AWS for SCALA collector? Also I am unable to find information on scaling properties of this collector.

I am unable to find Kinesis set up steps on
https://github.com/snowplow/snowplow/wiki/Configure-the-Scala-Stream-Collector

Any pointers will be appreciated.

Best
SS

josh · August 18, 2016, 4:49pm

Hi @sachinsingh10,

Configuring the collector is all very dependant on the amount of traffic you are expecting, hence there is no defined ruleset for how to configure the collector. It is unique to what you expect to collect.

However for scaling the collector we generally recommend the following:

Autoscaling Group with m3.* instances, avoid t2.* as they can be throttled and remove the ability to effectively scale based on CPU.
Place this ASG behind a Load Balancer.
Scale the collector ASG based on two metrics:
- CPU usage: we tend to scale up at 60% utilisation with step scaling at values greater than 85%. In that if it is greater than 85% we provision two extra nodes rather than 1.
- Latency scaling: The Load Balancer provides metrics around the latency to the collectors, if this is very high chances are your collectors are overworked and you need to add nodes to provide a good quality of service. This is highly variable based on the node type you pick, m3.mediums tend to have quite high average latency whereas an m3.xlarge will happily stick below the 5-10ms mark under load.

I would recommend that you setup the architecture as described here with an ASG and Load Balancer and add very basic rules for scaling based on CPU. You can then start tuning the rules to work best for you so you end up with a stable cluster that scales appropriately.

If you can share some usage scenarios we can better help you define rules for scaling!

Hope this helps,
Josh

sachinsingh10 · August 19, 2016, 12:59am

@josh Spot on thank you that is what I was after.

atharvai · April 16, 2018, 4:46pm

how does this advice scale for docker containers now?

aditya · January 30, 2020, 10:38am

Hi,

I would like to get advice on scala stream collector’s AWS instance too. I am also will be using the docker version of this. All I know is to setup an EC2 instance and deploy docker inside it. But I would like to know how to do it with auto-scaling & load-balancing enabled? And what is the minimum required instance?

josh · January 30, 2020, 4:25pm

Hi @aditya,

You can use a t3.micro as a the minimum required instance type. The JVM does want a bit of memory available so going too small can cause the collector to be somewhat unreliable.

Scaling can be configured using an auto-scaling group and simply scaling up with CPU usage - we tend to scale up at 60% CPU usage across the collector cluster to ensure a comfortable safety net.

This article from Amazon should help with the particulars:

https://docs.aws.amazon.com/autoscaling/ec2/userguide/autoscaling-load-balancer.html

aditya · January 30, 2020, 4:51pm

Hi @josh

In the instructions you made above, you mention about having latency scaling policy but seems like I cannot have more than 1 policy with AWS. I can only have that CPU utilization as scaling metric. Any way to add more scaling metric?

mike · January 30, 2020, 9:26pm

In AWS you should be able to attach multiple scaling policies to a single autoscaling group (for both simple scaling and step scaling).

Topic		Replies	Views
Scala Stream Collector - scaling Collectors	7	3520	January 25, 2017
Recommendations for starting Snowplow procs in autoscaling groups For engineers	2	905	May 6, 2019
Optimizing AWS Stack for Medium Load Site For engineers	4	1797	January 31, 2017
Feedback on Snowplow documentation AWS real-time pipeline	2	2011	July 10, 2017
Compute profiles of Scala Collector & Enricher Enrichment	3	1460	November 29, 2016

AWS minimum Config for Setting Scala collector

Related topics