On a newly populated instance of R97, I have accidentally deleted all the rows from a custom “unstructured events” table. All the other Redshift tables are intact.
Is it possible to repopulate the empty table from the shredded archive bucket?
On a newly populated instance of R97, I have accidentally deleted all the rows from a custom “unstructured events” table. All the other Redshift tables are intact.
Is it possible to repopulate the empty table from the shredded archive bucket?
@wleftwich, it should be possible.
config.yml
(just in case) with shredded:good
pointed to the bucket in step 2.--resume-from rdb_load --skip archive_enriched,archive_shredded
NOTE: The options depend on the version of the EmrEtlRunner. Not sure if you actually can use both --resume-from
and --skip
options at the same time. The idea is to load data only and avoid archiving the files as they are already in archive bucket. If it doesn’t work then archive into a separate “bin” bucket to be deleted after completion.
Great, thanks for the quick advice.
I ended up doing this a different way.
Inspecting the shredded:archive paths specific to this unstruct type, it occurred to me that I could use the Redshift “COPY JSON” sql command and load each path directly into a staging table. That’s what I did, and it seemed to work fine.
Thanks for sharing @wleftwich!
@wleftwich, that’s yet another way to do it indeed. The rdb_load
step of the ETL job uses COPY FROM JSON
to load shredded types. It is fine if you have just a few runs in the archived folder. However, it would be very tedious if the data for a long period is to be recovered.
Glad it worked out for you.