Hi,
Can anyone please clarify the us of the S3 buckets in the snowplow pipeline.
In our setup we have data arriving in snowflake.
Data remains in the S3 bucket in a few folders, like:
/bad/2023-05-19-091257-49640651319786124478019421013524134068410955795710083074-49640651319786124478019421013524134068410955795710083074.gz
/enriched/2023-05-19-141453-49640610199352849926609418563779391020857037214386749442-49640610199352849926609418563779391020857037214386749442.gz
/enriched/run=2023-05-31-00-00-00/output=good/sink-cb6a5bed-1fd8-46a2-9a07-07e52a175fac-0001.txt.gz
raw/2023-05-19-122351-49640651313363509860842605548777018968499093315390537730-49640651313363509860842605548777018968499093315390537730.gz
transformed/good/run=2023-05-22-14-25-00/output=good/sink-1998d5cd-606c-4071-a9dd-f5a0e1719d13-0001.txt.gz
Does the pipeline delete any of these automatically?
Are any of these required once the data in in snowflake?
What is a normal policy for managing this leftover data to make sure it does not esculate?
Many thanks for any help!
Chris.