Is this Lambda Architechture possible

sachinsingh10 · November 14, 2016, 4:03am

Hi Snowplowers,
I presume when we set up batch pipeline successfully we move into RT I came across this great post from @ihor. And further came across other posts where it was mentioned that that Drip feed into Redshift wasn’t yet possible.

Continuing the discussion from How to setup a Lambda architecture for Snowplow:

Am I correct to say that if I have the BATCH pipeline set up, then I can set up the RT stream by setting up
KinesisEnrich --> KinesisGood/Bad Stream --> Kinesis ES Sink --> Kibana. The only difference is that is two parallel stream and in future we might not require Batch at all once the RS drip feed is ready ?

I hope I made sense here.

Regards
SS

mike · November 14, 2016, 4:36am

If you setup the real time pipeline you’ll get both the Elasticsearch sink (realtime) and the Redshift sink (batch) without duplicating architecture.

I’m not sure about the feasibility of drip-feeding Redshift, from the implementations I’ve seen it never quite works that well. Redshift is an excellent data warehouse but a poor real time analytics database.

alex · November 14, 2016, 9:55am

If you already have the batch pipeline setup, there is no equivalent lambda architecture you can introduce to then bring in the real-time load into Elasticsearch. This is because the Clojure Collector cannot feed our real-time pipeline - it’s a fundamentally batch-oriented collector (it rotates to S3 hourly).

Your options are:

Setup a complete end-to-end real-time pipeline in parallel (i.e. starting from Scala Stream Collector onwards)
Rebuild your setup to use the standard Snowplow lambda architecture that you reference

sachinsingh10 · November 14, 2016, 11:28am

@alex Thanks for your reply.

Maybe I misquoted, my set up is:

Scala Stream Collector --> Kinesis S3–> ETLEMR -->Storage Loader.

Based on your suggestion do I can simply add ?
Existing Scala Collector --> Stream Enrich–>Elatisearch.

Appreciate your time.

Regards
SS

alex · November 14, 2016, 11:52am

Hey @sachinsingh10 - aha, sorry for the confusion! In that case yes, with your current setup:

Scala Stream Collector --> Kinesis Raw Stream --> Kinesis S3--> ETLEMR -->Storage Loader

Yes just add:

Scala Stream Collector --> Kinesis Raw Stream --> Kinesis S3--> ETLEMR -->Storage Loader
                                              --> Stream Enrich-->Elatisearch

sachinsingh10 · November 14, 2016, 11:56am

Thanks @alex what do you recommend I put --> Stream Enrich–>Elastisearch on? As in Loadbalanced Ec2, any minimum config recommendation.

Regards
SS

Topic		Replies	Views
Shredding to Redshift in the Scala Collector Flow AWS batch pipeline (Legacy)	2	2118	September 24, 2017
Configuring Batch + Real-time Pipelines in Parallel For engineers	6	2047	January 17, 2023
Is my version of snowplow lambda architecture correct For engineers	3	2213	May 17, 2018
Scala Kinesis Enrich AWS real-time pipeline	9	2883	April 9, 2018
Real-time pipeline AWS real-time pipeline	2	1955	May 24, 2018

Is this Lambda Architechture possible

Related topics