Normalize Atomic event dbt redshift Open source verison

Macari_Saura_Lopez · March 15, 2024, 10:14am

Hi all!
We are working on a POC using snowplow open source version with Kafka, s3 , redshift and Clickhouse.

Now we are collecting events from web using snowplow collector, then use snowplow enriched events, using Kafka and then send each event to s3.

We found in the documentation that snowplow has a dbt package to normalize atomic event but it’s not compatible with Redshift.

I found it that to send data and convert atomic event into normalized tables in redshift we have to use RDB Loader but I understand that this part works only with the BDP version it’s right?

The snowplow documentation is very confusing and I have some doubts.

Rdb-loader from AWS only works if the input is kenesis no?

We have to use all the components to normalize data, kinesis, etc… or if now we have the raw atomic event into s3, we can copy the event into a table and then only executing dbt-normalize models, we can get the star model into redshift?

Someone can give some tips about this?

Thanks a lot.

Ryan · March 15, 2024, 2:14pm

You’re correct in that the dbt-normalize models don’t currently support Redshift, it’s not a high priority on our roadmap give the shredded nature of the redshift snowplow tables anyway unfortunately. I’ll ask someone from the loaders space to comment more on the s3 data into a table, because I think there are some complexities around the data shredding for redshift.

Macari_Saura_Lopez · March 18, 2024, 7:35am

But if we want to use other dbt-model, for example

We don’t need RDB Loader no? I understand that for split the atomic event using this packages only need dbt and the files with the atomic events. It’s right? thanks

Ryan · March 18, 2024, 9:08am

Unfortunately all our dbt packages require a warehouse and are built to run on a database, not a lake, so you would need to load the files with the RDB loader for the data to be in the correct format.

It is also worth noting the Unified package is under the SPAL license which means you must purchase a license to use it if you are not a BDP customer or using only for personal or academic reasons.

Macari_Saura_Lopez · March 18, 2024, 11:29am

I can see that RDB loader only works using Kinesis if we have AWS. But in BDP version it’s possible to use RDB loader but using a Kafka source, like AWS MSK or Confluent cluster instead of kinesis?
Thanks

Topic		Replies	Views
New dbt package! Snowplow-normalize 0.1.0 released New releases	0	1061	December 7, 2022
RDB-Loader Redshift - Invalid operation: column "filters" duplicated For engineers	5	1264	December 21, 2022
Customizing our Snowplow event representation in Redshift Redshift	9	2457	September 26, 2016
RDB-Loader Redshift "Delimiter not found" For engineers	2	1365	November 7, 2022
Data modelling for real time kafka pipeline Enrichment	2	1068	September 2, 2020

Normalize Atomic event dbt redshift Open source verison

Related topics