How does the Scala Stream Collector compare to the Clojure Collector?

Is one of these collectors slated for deprecation or will receive more development and better support in the future? Is one of them known to scale better than the other? What about ease of administration, is one more reliable? In terms of features, are they equal?


  • David

Hey @vantage,

Great question:

  • Both collectors have feature parity at the moment
  • The Clojure Collector is easier to setup but harder to reason about (more Beanstalk “magic”)
  • The Clojure Collector only rotates event logs to S3 every hour
  • The Scala Stream Collector is used in the Snowplow real-time pipeline, in Snowplow Mini, and can be used with the Snowplow batch pipeline by connecting it to a kinesis-s3 instance
  • The Clojure Collector is only used with the Snowplow batch pipeline
  • Longer term we plan on adding more back-ends to the Scala Stream Collector (e.g. Kafka, NSQ), so expect this collector to become primus inter pares