Snowflake dB loader versus snowpipe for storage step

Ryan_Newsome · May 8, 2020, 2:32pm

Hi there,
I’m nearly up to the storage step of the snowplow setup, and am wondering what the benefits of using the snowflake dB loader instead of using Snowflake’s snowpipe feature (to pickup and ingest the good enriched files generated by enrich process)?

Cheers
Rysn

mike · May 9, 2020, 1:45am

The Snowflake DB loader keeps a manifest of what data (and what columns) have been loaded by Snowplow into Snowflake. This means that if you start sending in a new event / context to your pipeline then the loader can transform your atomic.events table before loading the data in so you don’t end up with ‘column not found’ style errors.

As far as I know this pattern isn’t easy to establish with Snowpipe which is instead designed more for loads into tables that do not dynamically change or if they do the expectation becomes that a manual modification of ALTER PIPE and ALTER TABLE may occur.

Ryan_Newsome · May 9, 2020, 5:52am

Thanks @mike I really appreciate it. Snowflake DB loader it is!

I dug a little deeper and eventually found the documentation covering the functionality you have described: https://github.com/snowplow-incubator/snowplow-snowflake-loader/wiki

Thanks again,
Ryan

medicinal-matt · January 5, 2022, 5:15pm

Wouldn’t using the Snowpipe instead of the Snowflake Loader also miss event deduplication? Or does event deduplication already happen in the Stream Enrichment?

BenB · January 6, 2022, 1:42pm

Hi @medicinal-matt ,

There is no deduplication in the enrichment stream, but you can activate it in Snowflake loader. Instructions can be found here.

Topic		Replies	Views
RDB Loader 5.3.1 released (with important bug fix on Snowflake Loader) New releases	0	712	January 25, 2023
Duplicate events, using event_id as partition_key Troubleshooting	1	2750	October 20, 2017
Multiple SQL enrichments depending on event schema Enrichment	2	600	February 29, 2024
Migrating from Redshift to Snowflake Storage targets	3	2660	June 11, 2018
The result of my Stream Enrich has more columns than atomic events	3	1888	September 26, 2019

Snowflake dB loader versus snowpipe for storage step

Related topics