Architecture question

johnschmidt · July 10, 2019, 11:07am

Hi,

i have a few questions concerning the AWS architecture:

Is it a good idea to run the collector and the enricher on the same EC2 instance or should i rather configure 2 separate ones?
Which linux type is best suited for running the collector and enricher or does it not really matter?
Which ressources should i give an EC2 instance for the collector and the enricher?
Im running them in the moment on a t2.large system. I know that depends at the end on the size of the traffic - i am interested though which process needs more cpu, ram or network performance.

Thanks a lot,
John

CoconutTherapy · July 10, 2019, 2:51pm

Hi John,

We’ve been running Snowplow for 6 months in production and we process about 50M events a month, mostly as click-stream events.

From our experience:

It’s best to separate the two. First the enrichment process caches schemas and you might need to restart the server in case schemas change and you wouldn’t want to restart the collector at the same time. It’s also best to put the collector behind an ELB with at least 2 ec2 for safety. Once your events are in the collector stream, they can easily be replayed for enrichment, but if they don’t reach the first stream, they are likely lost forever.
We have not tried anything else than Ubuntu but the setup is very easy and we have not had any issue with it.
We found that the collector is very lightweight and can run on a t2.micro at our scale. However the enrichment process is quite memory-hungry and we had to scale our t2.small to a t2.medium. CPU-wise it might depend if you run cpu-intensive JS functions to enrich your events.

Our scale is quite humble so I hope our experience can still shed some light on your questions.

Best,
Arthur

johnschmidt · July 11, 2019, 7:39am

Hi Arthur,

thanks a lot for that insight, that helped a lot!

Cheers,
John

Topic		Replies	Views
Snowplow on ARM based CPUs For engineers	6	876	November 3, 2020
Setting up the real-time pipeline on AWS AWS real-time pipeline	24	5963	May 25, 2021
Snowplow Collector, Enricher and Lambda's run in Containers?	12	558	November 30, 2023
Enricher CPU utilization Enrichment	5	144	June 18, 2024
Snowplow Enricher - CPU utilization issue Enrichment	5	940	March 2, 2023