Normalize Atomic event dbt redshift Open source verison

Hi all!
We are working on a POC using snowplow open source version with Kafka, s3 , redshift and Clickhouse.

Now we are collecting events from web using snowplow collector, then use snowplow enriched events, using Kafka and then send each event to s3.

We found in the documentation that snowplow has a dbt package to normalize atomic event but it’s not compatible with Redshift.

I found it that to send data and convert atomic event into normalized tables in redshift we have to use RDB Loader but I understand that this part works only with the BDP version it’s right?

The snowplow documentation is very confusing and I have some doubts.

Rdb-loader from AWS only works if the input is kenesis no?

We have to use all the components to normalize data, kinesis, etc… or if now we have the raw atomic event into s3, we can copy the event into a table and then only executing dbt-normalize models, we can get the star model into redshift?

Someone can give some tips about this?

Thanks a lot.

You’re correct in that the dbt-normalize models don’t currently support Redshift, it’s not a high priority on our roadmap give the shredded nature of the redshift snowplow tables anyway unfortunately. I’ll ask someone from the loaders space to comment more on the s3 data into a table, because I think there are some complexities around the data shredding for redshift.

But if we want to use other dbt-model, for example

We don’t need RDB Loader no? I understand that for split the atomic event using this packages only need dbt and the files with the atomic events. It’s right? thanks

Unfortunately all our dbt packages require a warehouse and are built to run on a database, not a lake, so you would need to load the files with the RDB loader for the data to be in the correct format.

It is also worth noting the Unified package is under the SPAL license which means you must purchase a license to use it if you are not a BDP customer or using only for personal or academic reasons.

I can see that RDB loader only works using Kinesis if we have AWS. But in BDP version it’s possible to use RDB loader but using a Kafka source, like AWS MSK or Confluent cluster instead of kinesis?
Thanks