Hi there, welcome to the community!
this package works out of the box only if data in snowflake was loaded using the loader.
This will be quite a big challenge. The loader does a lot of work in formatting the data to be more usable, and the models are built for that input.
- Does this make sense to take this approach of building these sources myself from these contexts columns and make the package work ?
Honestly, it doesn’t really make sense to me.
It is possible to do it, but this requires that you manually instrument in SQL what the Open Source Snowflake loader does already in Scala.
It’s definitely less efficient to do it in SQL, and Snowflake charges by compute, so you’re paying extra to do it too.
Seems like a lot of work to be honest.
Regarding your actual plan for working with the data - I think this ticket outlines the task and issues you might encounter. We also have a blog due to be released soon on the topic (in the next couple of days I think, so good timing!)
- Only thing I don’t understand is all of these sources have the following columns that I have no idea where they are coming from ?
root_id
root_tstamp
ref_root
ref_tree
ref_parent
These are all fields that are set by the Redshift loader. If they’re in your data then you’ve got yet more challenges - the format of data in Redshift and Snowflake are very different, because those databases are very different.
This makes the request quite confusing to be honest… It seems like the proposed architecture is:
Enriched stream → Redshift Loader → Redshift → Snowpipe → Snowflake → Heavy SQL processing to transform the data → Data model
Whereas the architecture only needs to be:
Enriched stream → Snowflake Loader → Snowflake → Data model