Alright so we now have the Quick start setup running on AWS, and are getting results in the RDS database. I have a couple questions around setup for Redshift
- My understanding is that this is typically accomplished through the RDB shredder/loader, which involves setting up the dataflow runner. I have viewed the documentation around this here: https://docs.snowplowanalytics.com/docs/pipeline-components-and-applications/dataflow-runner/ but I was wondering if there were any more step-by-step guides for getting that up and running as I’m not sure where the best place to install the runner is or how that runs really (not familiar with EMR clusters/playbooks).
- I wanted to gut-check the differences between the postgres loader and the rdb loader. From what I’ve been able to tell, the rdb loader is splitting up the data and storing differently than the postgres loader which I imaging helps in storage size. Does that sound right?
- If my understanding above is correct, is there a simpler way to push to redshift event data similarly to the postgres loader, without breaking it up? Issues with that approach?
Thanks