Redshift Spectrum instead of loading via COPY

trung · June 24, 2019, 12:29pm

We are using a redshift instance type dc2.large and quickly reaching it’s storage limit. It makes sense we offload the atomic schema into redshift spectrum as we don’t often query this data and mainly use it for data modelling once a day.

Does there exist any ETL process which can transform into spectrum ready files?

Konstantinos_Servis · June 25, 2019, 9:45am

Hey @trung,

You can use your archive files in spectrum directly or by turning them into parquet (using glue). Have a look at this article for some ideas: https://snowplowanalytics.com/blog/2019/04/04/use-glue-and-athena-with-snowplow-data/

rahulj51 · June 30, 2019, 8:29am

This is good for one-time. But I guess one has to write a scheduled job to add more partitions as the data comes-in.

Also, is it possible to not run the Redshift copy job at all and instead replace it with a job to create Spectrum partitions? Would we still get the benefits of deduplication that the storage process contains?

Konstantinos_Servis · July 1, 2019, 6:54am

Yes and AWS Glue makes that rather easy.

In theory you could but spectrum will not be as performant, I would think, as having the data locally in the native format. What I would do in your case is keep only the most recent data up to what your storage allows and keep adding partitions for spectrum.

I would really need to look closely into the loader deduplication to tell you if that would be included, but I would expect that it would be if you are reading from the shredded archive.

Topic		Replies	Views
Can we use spectrum to query shredded data instead of enriched? For data modelers & consumers	1	2241	September 27, 2017
Using shredded data for loading into databricks as parquet format	6	777	October 7, 2022
Rerun storage loader from archived files Storage targets	3	1573	January 29, 2017
Redshift Spectrum Redshift	1	1655	August 7, 2017
EmrEtlRunner sink Shredded data into S3 bucket For engineers	0	703	November 11, 2019

Redshift Spectrum instead of loading via COPY

Related topics