Collector configurations

kaustubh.umalkar · January 10, 2019, 5:17am

Hi Folks,

We are planning to have a loadbalancer pointing to two scala collectors.
Our question is how to configure the load balances

If we configure round robin pattern then the requests from same client can be distributed across both the collectors.However in this case both collectors will be writing to the same kinesis stream.Wont the events go out of sequence? How can the order of events be maintained? or it doesnt matter?
Or do we have to ensure requests from same client goes to same collector so they reach in the same incoming order?
while requests from another client can be configured to reach the second collector?

mike · January 10, 2019, 9:38am

There’s currently no ordering guarantees in Snowplow at the moment. Kinesis itself supports ordering within a shard of a Kinesis stream but not across shards within the same stream.

The order guarantees don’t matter too much if you’re loading to a target like Redshift (where you can sort data) or BigQuery (where you can partition data by date / time) but may impact you if you’re planning on doing some kind of stream processing which requires events to be in order e.g., some sort of real time sessionisation.

Colm · January 10, 2019, 11:18am

To add to mike’s answer - you’ll have dvce_created_tstamp and derived_tstamp in the data, which are generated on a tracker level (sessionisation is done at tracker level in general too although you may have some use case to manually do it).

These two timestamps preserve the order in which the events were created - so most use cases in which order of events arise are covered without needing them to be processed in order. The tracker itself doesn’t care about the order in which it sends events, so even if you’ve instrumented the pipeline to preserve order, connectivity issues can cause events to be sent late/out of order. (Events will be cached if the tracker can’t contact the collector).

Best,

kaustubh.umalkar · January 10, 2019, 12:12pm

Thank you guys for the quick clarification.

Topic		Replies	Views
Deduping Events at collector /enricher level in stream Collectors	1	1102	August 19, 2019
Kinesis stream in front of collector AWS real-time pipeline	4	1236	February 17, 2020
Snowplow Mini - Two Kinesis sinks to Elasticsearch? Snowplow Mini	3	2205	July 21, 2017
Communication not happening between scala stream collector and javascript tracker AWS real-time pipeline	2	1378	July 27, 2017
Feedback on Snowplow documentation AWS real-time pipeline	2	2011	July 10, 2017

Collector configurations

Related topics