In an effort to make Snowplow pipeline unbreakable, we are very excited to release Stream Collector 2.0.0 that adds surge protection for AWS. It is available on Docker Hub.
While PubSub is auto-scalable, Kinesis is not, and when Kinesis can’t keep up with the throughput (in case of extreme traffic spikes for instance), events accumulate in collector memory for later (infinite) retries, which can lead to OutOfMemory
errors.
To address this potential issue, it is now possible to specify a SQS buffer where to write events to in case of a problem when writing to Kinesis (e.g. maximum throughput exceeded).
To activate this feature, create the SQS queues (one for enriched events and one for bad rows) and populate these lines in the configuration of the collector. Please note that writing to SQS will be retried maximum 10 times and that the size limit of an event drops to 192kb.
To send events back from SQS to Kinesis, this app can be used. It is available on Docker Hub. To run it, specify SQS_QUEUE
, KINESIS_STREAM_NAME
and SENTRY_DSN
environment variables. The writing to Kinesis is performed with back-pressure to not overwhelm it.
This is the first release from the collector’s new home. Should you have any issues, please report them on this repo.