Is it possible to setup a pipeline as suggested in the title?
If so, what parts do I need to make this work?
- Is this assumption correct?:
Scala Stream Collector
Scala Stream Collector installed on two CentOS instances with a load balancer in front of it. Collecting the events from the trackers.
Setup the Kafka Sink
As found on: Configure the Scala Stream Collector · snowplow/snowplow Wiki · GitHub
The
collector.streams.sink.enabled
setting determines which of the supported sinks to write raw events to:
"kafka"
for writing Thrift-serialized records and error rows to a Kafka topic
You should fill the rest of thecollector.streams.sink
section according to your selection as a sink.
I then read the Kafka Topic using the thrift extension:
https://druid.apache.org/docs/latest/development/extensions-contrib/thrift.html
- Where would I find the settings for the kafka sink?
Finally, I add the JavaScript tracker to my website and it gets things going?
3. Can I rename snowplow functions so adblockers don’t pickup sp.js or fired events?
Is this about right to get things going?