Snowplow | Setting up architecture for different event-types

johnschmidt · July 12, 2019, 8:51am

Hi,

i set up the snowplow-process for error-events from our apllication.
They are succusfully stored in the elasticsearch db now.
Now i am wondering how i set up different kinde of events.
I dont only want to store error events but also routing-events etc. and i dont want so save them
all in the same elasticsearch index/type (or is this best practice).
I would like to store error-events and routingdata in differeent types, maybe also in different indices.

The way snowplow is set up right now in my case:
The tracker sends to a collector which writes data into a kinesis-stream ‘goodStream’.
The enricher enriches and writes to ‘enrichedStream’.
The es-loader reades from ‘enrichedStream’ and writes to es-db with fixed index and type set in the config.

Do i have to create separate stream like ‘goodErrorStream’ etc. and start separate collector/enrich/es processes to be able to separate them or is there another way to do this?

Thanks a lot,
John

josh · July 15, 2019, 8:27am

Hi @johnschmidt - we tend to just put all events in a single index. However if you did want to split things out you can - albeit with a more complicated architecture!

You would need to essentially create new child streams from your “enriched” event stream by running an application / lambda on this stream which could filter and push specific events into specific “enriched” streams. You can then run an Elasticsearch Sink per child stream which lets you configure each index individually then.

Enriched Stream → Lambda Filter 1 → Enriched Stream 1 → Elasticsearch Sink 1
Enriched Stream → Lambda Filter 2 → Enriched Stream 2 → Elasticsearch Sink 2

Another option, which to be clear we have not done internally before, would be to setup Lambda + Kinesis Firehose to do this in a more server-less fashion. The Lambda function would be used for filtering & transformation and you would use Firehose to push the data into Elasticsearch. The transformation is particularly important here as you will need to ensure the data you send to Elasticsearch is correctly JSON formatted (which the Elasticsearch Sink does for you).

johnschmidt · July 21, 2019, 4:47pm

Ok, thanks a lot for the advice!

Cheers,
John

Topic		Replies	Views
Is my version of snowplow lambda architecture correct For engineers	3	2213	May 17, 2018
Sending unstructured events + Schemas For engineers	5	2521	July 11, 2019
Golang Kinesis Reader For engineers	5	1361	March 10, 2018
Unable to receive Snowplow data into Elasticsearch Data store sources	14	3424	January 17, 2018
Enrich Elastic Search Sink / Sink Bad Enrichment	6	1030	December 10, 2020

Snowplow | Setting up architecture for different event-types

Related topics