EmrEtlRunner sink Shredded data into S3 bucket

kevvo83 · November 11, 2019, 11:00am

Hello!,

Our Snowplow installation will hit storage shortages on the Redshift cluster. We could add another node and it would solve the problem for now.

I’ve done the following to test out Redshift Spectrum -

UNLOAD 3 days of atomic.* tables from Redshift into S3 buckets (partitioned by date, ordered by event_id, etc.)
Run SQL Runner batch jobs against this data

Based on that test results, I have the following question -
Q. What would be a recommended approach to Sink the shredded data from EMRETLRunner directly into an S3 bucket?

(I realise that the data currently will be in web/archive/shredded/run=x/ etc - my question is more about a production batch run - i.e. how would I be able to customize the partitioning, etc.).

Hope to hear back from you!

Regards,
Kevin

Topic		Replies	Views
Loading data from s3 to Redshift after EmrEtlRunner Troubleshooting	7	3574	November 19, 2018
Should I run rdb_load only? For engineers	7	1235	February 11, 2020
EmrEtlRunner not loading data into RedShift For engineers	22	2155	November 11, 2019
EmrEtlRunner::EmrExecutionError while storing the events in redshift database AWS batch pipeline (Legacy)	2	2439	October 16, 2017
Processing logs for a specific time period AWS batch pipeline (Legacy)	5	1518	November 14, 2016

EmrEtlRunner sink Shredded data into S3 bucket

Related topics