I’ll jump here as I think I can help you understand what you’re dealing with.
And about the events that couldn’t be sent, in my example, if I sent 200 events in a second, 100 will be collected, and the others 100 was lost? Or kept in some wait list?
100 events per second isn’t much, so I doubt it’d break anything. I don’t think that’s what you mean, I realise that you just picked a number to illustrate teh question - just clarifying in case someone in future gets the wrong impression. I’ll describe what happens when you do send enough volume to break a collector (which doesn’t scale up).
If the collector cannot accept an event, then it either will return a non-2XX response, or it won’t respond within the request timeout. In that situation, all of the trackers that we maintain will store the event in a queue, and retry them later (unless configured not to do so).
This means that as long as the user returns to the website or app, the data is not lost. If the user never returns, or clears their cache before returning then the data would be lost.
For things like webhooks or server-side tracking, it depends on the behaviour that you’ve set up.
So, if the collector goes down, there is risk of some data loss, but that risk is somewhat minimised by the trackers’ retry behaviour.
Once the data reaches the collector successfully the risk of data loss is next to nil.
For this reason, we always recommend setting up collectors with multiple instances, and lots of memory & CPU overhead, to deal with spikes in traffic.
And just scale horizontally the server, automatically we will receive more events, or some configuration to consider this new server should be done?
Most of our apps are compatible with - and built for - horizontal scaling, including the collector. (I think the databricks loader is the only one that isn’t - but it’s early life and we’re scoping it).
So you won’t need to change anything about configuration, you just need to worry about configuring the autoscaling correctly.
I hope that’s helpful!