Questions about setting up the real-time pipeline

stephan · May 4, 2016, 9:14pm

Hi,

Does anyone have an better set by step guide to set up SnowPlow ( wanting to use the Scala Stream Colletcor ) on AWS? I am struggling a bit to follow the guides on github.

Thanks,

ihor · May 4, 2016, 9:29pm

Hi @stephan,

Sorry to hear you are having difficulties setting up real-time pipeline.

Could you be more specific about the kind of problem you have encountered, please?

Regards,
Ihor

stephan · May 4, 2016, 9:31pm

Well I am basically new to AWS and SnowPlow. I am struggling to understand exactly what I need to do to get it up and running. I am pretty good with php, mysql and js but never done anything like this.

stephan · May 4, 2016, 9:41pm

I have seen you changed the title of my question. I am not only struggling with the collector but with the entire setup. But I guess getting a collector installed would be a good step into the right direction.

ihor · May 5, 2016, 12:57am

@stephan,

Taking into account your comments I would suggest familiarizing yourself with a general concept of Snowplow pipeline first. Each component of the pipeline could be built/setup independently. Do it one-by-one ensuring it works before proceeding to setting up the other component.

Regardless of whether it is a batch pipeline or a real-time pipeline Snowplow pipeline consists of the 6 main components. The one you are most interested at the moment are:

1.Tracker -> 2.Collector -> 3.Enrichment -> 4.Storage.

You can start with either a JavaScript Tracker or Stream Collector.

The easiest way to install the Stream Collector is to download the compiled and zipped application. Just follow the instructions here. In fact, the archive file comes with 2 more components, namely Stream Enrich and Kinesis Elasticsearch Sink. The former is the going under 3.Enrichment component in my diagram above. The latter is required if you want to store your data in Elasticsearch and is depicted as the 4.Storage component.

Before you can launch the collector you need to configure it. Once the configuration hocon file is amended to reflect your pipeline setup you can try to run it.

Mind you that you could simply install the Snowplow Mini app which is the whole real-time pipeline in miniature with a “one-click” (kind-of) set-up. The whole app is running on a single EC2 instance!. See if this suits your needs.

Regards,
Ihor

sachinsingh10 · August 11, 2016, 2:01am

@ihor
What are the recommendations for AWS node for SCALA collector? Also I am unable to find information on scaling properties of this collector.
Secondly in setup guide Kenisis setup steps seem to be missing.

Best
SS

Topic		Replies	Views
Setting up the real-time pipeline on AWS AWS real-time pipeline	24	5963	May 25, 2021
SnowPlow in AWS	1	896	September 11, 2020
Setup Each Service AWS real-time pipeline	15	1241	September 15, 2021
Monitoring of snowplow AWS batch pipeline (Legacy)	10	4726	May 29, 2019
Is this Lambda Architechture possible AWS real-time pipeline	5	2269	November 14, 2016

Questions about setting up the real-time pipeline

Related topics