Snowplow > Kafka > Druid

MrMoronIV · October 20, 2019, 1:58pm

Is it possible to setup a pipeline as suggested in the title?

If so, what parts do I need to make this work?

Is this assumption correct?:

Scala Stream Collector
Scala Stream Collector installed on two CentOS instances with a load balancer in front of it. Collecting the events from the trackers.

Setup the Kafka Sink
As found on: Configure the Scala Stream Collector · snowplow/snowplow Wiki · GitHub

The collector.streams.sink.enabled setting determines which of the supported sinks to write raw events to:
"kafka" for writing Thrift-serialized records and error rows to a Kafka topic
You should fill the rest of the collector.streams.sink section according to your selection as a sink.

I then read the Kafka Topic using the thrift extension:
https://druid.apache.org/docs/latest/development/extensions-contrib/thrift.html

Where would I find the settings for the kafka sink?

Finally, I add the JavaScript tracker to my website and it gets things going?
3. Can I rename snowplow functions so adblockers don’t pickup sp.js or fired events?

Is this about right to get things going?

MrMoronIV · October 20, 2019, 2:25pm

In addition, I think I need this: https://github.com/snowplow/snowplow/tree/master/3-enrich/stream-enrich
To read events from the collectors and push them to a Kafka topic, is that correct? Or does the Stream Enrich read from the raw Kafka Topic instead?

Topic		Replies	Views
Scala Stream collector integration with Kafka For engineers	8	2921	July 25, 2016
Kafka-elasticsearch sink Kafka real-time pipeline	4	3128	April 5, 2017
Enrich with Kafka Kafka real-time pipeline	6	7629	May 16, 2017
SnowPlow in AWS	1	896	September 11, 2020
Help tracking form data via unstructured event & kafka Kafka real-time pipeline	3	2810	April 13, 2017

Snowplow > Kafka > Druid

Related topics