Snowplow | Setting up architecture for different event-types

Hi,

i set up the snowplow-process for error-events from our apllication.
They are succusfully stored in the elasticsearch db now.
Now i am wondering how i set up different kinde of events.
I dont only want to store error events but also routing-events etc. and i dont want so save them
all in the same elasticsearch index/type (or is this best practice).
I would like to store error-events and routingdata in differeent types, maybe also in different indices.

The way snowplow is set up right now in my case:
The tracker sends to a collector which writes data into a kinesis-stream ‘goodStream’.
The enricher enriches and writes to ‘enrichedStream’.
The es-loader reades from ‘enrichedStream’ and writes to es-db with fixed index and type set in the config.

Do i have to create separate stream like ‘goodErrorStream’ etc. and start separate collector/enrich/es processes to be able to separate them or is there another way to do this?

Thanks a lot,
John

Hi @johnschmidt - we tend to just put all events in a single index. However if you did want to split things out you can - albeit with a more complicated architecture!

You would need to essentially create new child streams from your “enriched” event stream by running an application / lambda on this stream which could filter and push specific events into specific “enriched” streams. You can then run an Elasticsearch Sink per child stream which lets you configure each index individually then.

Enriched Stream → Lambda Filter 1 → Enriched Stream 1 → Elasticsearch Sink 1
Enriched Stream → Lambda Filter 2 → Enriched Stream 2 → Elasticsearch Sink 2

Another option, which to be clear we have not done internally before, would be to setup Lambda + Kinesis Firehose to do this in a more server-less fashion. The Lambda function would be used for filtering & transformation and you would use Firehose to push the data into Elasticsearch. The transformation is particularly important here as you will need to ensure the data you send to Elasticsearch is correctly JSON formatted (which the Elasticsearch Sink does for you).

Ok, thanks a lot for the advice!

Cheers,
John