Repopulate a single table?

wleftwich · January 15, 2018, 5:24pm

On a newly populated instance of R97, I have accidentally deleted all the rows from a custom “unstructured events” table. All the other Redshift tables are intact.

Is it possible to repopulate the empty table from the shredded archive bucket?

ihor · January 15, 2018, 6:11pm

@wleftwich, it should be possible.

Pause the pipeline if on schedule to avoid conflict.
Create a separate bucket and copy all the shredded good files for this specific event from the archive shredded bucket (make sure to retain the same prefixes).
Create a separate config.yml (just in case) with shredded:good pointed to the bucket in step 2.
Run the EmrEtlRunner with --resume-from rdb_load --skip archive_enriched,archive_shredded

NOTE: The options depend on the version of the EmrEtlRunner. Not sure if you actually can use both --resume-from and --skip options at the same time. The idea is to load data only and avoid archiving the files as they are already in archive bucket. If it doesn’t work then archive into a separate “bin” bucket to be deleted after completion.

wleftwich · January 15, 2018, 6:17pm

Great, thanks for the quick advice.

wleftwich · January 16, 2018, 1:48pm

I ended up doing this a different way.

Inspecting the shredded:archive paths specific to this unstruct type, it occurred to me that I could use the Redshift “COPY JSON” sql command and load each path directly into a staging table. That’s what I did, and it seemed to work fine.

alex · January 16, 2018, 2:05pm

Thanks for sharing @wleftwich!

ihor · January 16, 2018, 7:09pm

@wleftwich, that’s yet another way to do it indeed. The rdb_load step of the ETL job uses COPY FROM JSON to load shredded types. It is fine if you have just a few runs in the archived folder. However, it would be very tedious if the data for a long period is to be recovered.

Glad it worked out for you.

Topic		Replies	Views
Shredded type not loading into Redshift AWS batch pipeline (Legacy)	1	1270	November 8, 2018
Processing logs for a specific time period AWS batch pipeline (Legacy)	5	1518	November 14, 2016
[IMPORTANT ALERT] R90-R91 bug may result in shredded types not loading into Redshift after recovery Troubleshooting	2	2580	September 7, 2017
EmrEtlRunner sink Shredded data into S3 bucket For engineers	0	703	November 11, 2019
Enrich old data Enrichment	1	1268	January 23, 2018

Repopulate a single table?

Related topics