Hi there,
I’m nearly up to the storage step of the snowplow setup, and am wondering what the benefits of using the snowflake dB loader instead of using Snowflake’s snowpipe feature (to pickup and ingest the good enriched files generated by enrich process)?
The Snowflake DB loader keeps a manifest of what data (and what columns) have been loaded by Snowplow into Snowflake. This means that if you start sending in a new event / context to your pipeline then the loader can transform your atomic.events table before loading the data in so you don’t end up with ‘column not found’ style errors.
As far as I know this pattern isn’t easy to establish with Snowpipe which is instead designed more for loads into tables that do not dynamically change or if they do the expectation becomes that a manual modification of ALTER PIPE and ALTER TABLE may occur.
Wouldn’t using the Snowpipe instead of the Snowflake Loader also miss event deduplication? Or does event deduplication already happen in the Stream Enrichment?