Is it possible to setup a pipeline as suggested in the title?
If so, what parts do I need to make this work?
- Is this assumption correct?:
Scala Stream Collector
Scala Stream Collector installed on two CentOS instances with a load balancer in front of it. Collecting the events from the trackers.
Setup the Kafka Sink
As found on: Configure the Scala Stream Collector · snowplow/snowplow Wiki · GitHub
collector.streams.sink.enabledsetting determines which of the supported sinks to write raw events to:
"kafka"for writing Thrift-serialized records and error rows to a Kafka topic
You should fill the rest of the
collector.streams.sinksection according to your selection as a sink.
I then read the Kafka Topic using the thrift extension:
- Where would I find the settings for the kafka sink?
3. Can I rename snowplow functions so adblockers don’t pickup sp.js or fired events?
Is this about right to get things going?