Snowflake loader with realtime pipeline

@ian,

If we’re partitioning the folders by YYYY-MM-DD-HH, is there a risk of a partially processed folder being marked as completed if the snowflake loader run occurs in the whilst the hour’s folder is still being filled

We typically archive the files produced by S3 Loader to a separate “archive” bucket and start transforming the run folders from there.

I’ve had a couple of instances where the transformer step has failed (due to an AWS problem), and then the job gets stuck due to new columns already existing. Manually dropping the columns fixes the problem. Is this expected behaviour?

I believe you are rather referring to Loader, not Transformer. Yes, if that happens you have 2 options: delete the newly created column or amend the manifest table. See Snowflake loader error - column already exists for more details.

1 Like