Deploying snowplow in combination of AWS and Azure

ahid_002 · January 9, 2023, 12:29pm

Hi everyone, Happy New year first of all.
I have 2+ years of experience in setting up and managing snowplow open source data pipeline where we used AWS EKS to setup our collector, enricher, stream-transfomer and rdb loader with Redshift.
so the design looked like

                              ---->  Elastic Search Loader
                             | 
Collector ----> Enricher ----> 
                             |
                              ---->  Stream-transformer (before real time it was EMR) ----> RDB Loader Redshift

I have joined a new company where they already have a data lake house in Azure which they cant switch to AWS. So they decided to implement Snowplow on Snowplow managed AWS account and then move the data to Azure Data/Delta Lake.

Following is the design that they have proposed. But I’m confused about the 2 things here.

I think there is a step missing here which is Tranformation step where Raw event is shredded into event + context that is then loaded by the RDB loader.
What would be the best practice to move data to Azure from AWS. And regarding Data bricks. Should this data bricks be in Azure or AWS environment.

stanch · January 9, 2023, 1:03pm

Hi @ahid_002! Always great to hear from repeat users of Snowplow

To your questions:

I suppose the “RDB Loader” block on the diagram implies both components — the RDB Transformer and the RDB Loader itself (more specifically, the Databricks Loader). Note that for your transformer you need to select the “wide row” Parquet format.
RDB Loader will work fine with Databricks hosted on Azure.

Topic		Replies	Views
Snowplow Analyticss on Azure For engineers	0	1276	June 15, 2018
Snowplow Open Source on Azure	6	1593	February 4, 2023
Snowplow RDB Loader 3.0.0 released New releases	4	2243	May 3, 2022
Is it possible to have snowplow events datamart in S3 datalake instead of Redshift/Snowflake/Databriks Enrichment	0	385	January 4, 2024
How to transform and load data from s3 into redshift Troubleshooting	3	535	December 11, 2023

Deploying snowplow in combination of AWS and Azure

Related topics