Hi @John,
I like this question, thanks for asking
I’ve got at least some experience with all three, so I can give you a brief answer here. The below are obviously all opinions, I’d be interested to hear what other people’s perspectives are on the whole thing.
(E: I should add that I don’t consider myself an expert by any means, but I have a lot of experience with Redshift, a considerable amount with Snowflake, and some experience with BigQuery - but still learning on all three fronts. Please do add a comment to correct me if you notice anything in the below that isn’t quite right as Mike has done below).
How do the three compare
They’re totally different types of Database.
Redshift
Redshift is a traditional relational database which chunks data into blocks, sorts via a SORTKEY and distributes via a DISTKEY under the hood. The dialect is very close to POSTGRESQL and so is familiar to most analysts who have turned their hands to relational SQL. A major drawback is that it requires manual work to maintain very high volume, and doesn’t support nested data. So you’re always trying to balance priorities across jobs to maintain order, writing SQL to produce efficient joins, and figuring ways around the headaches associated with volume.
If you’re a business who has a lot of back end systems, or a lot of ETL/modeling tasks which are written to a relational structure, or have consumers of the data who are used to relational structures, Redshift has advantages.
Snowflake
Snowflake is a different beast, but still ostensibly a type of relational database (As far as I understand it but maybe someone knows better than me). It utilises a set of cluster keys rather than sortkey/distribution and has a very smart auto-clustering algorithm. It supports nested data much better than Redshift. However, I’ve found that several (quite technical) people I’ve spoken to about it have struggled to wrap their heads around the concept of cluster keys and finding fields of the ideal cardinality to use - although you don’t need to manually set a cluster key unless your volumes are very high so that’s not too much of a problem.
It can be tricky to find efficient ways to query data in Snowflake if you need to both join and unnest data in the same queries. A final drawback is that I’ve come across the occasional bug - but there’s always been a workaround and the Snowflake team are super responsive and do release fixes on a decent turnaround so it’s not a major issue.
The major advantage of Snowflake over Redshift is that you don’t have to choose between compute and storage nodes, as they’re billed separately - and you can scale each according to your requirements. So choose a warehouse size that lines up well with your task and you’re on your way.
BigQuery
BigQuery isn’t so similar a relational database in the traditional sense. I think of it more a datalake with a query syntax on top - a bit like AWS’s Athena, in that the data is stored under the hood in ColumnIO format, and partitions & sharding rather than cluster keys/distribution keys/sort keys do the work of data distribution as far as I understand it. The pricing model is different too - you get charged for the amount of data scanned (with a 1TB/month free allowance).
Querying BQ is super super fast and it handles massive datasets really well. But it can’t do some things that other databases can - if you’re SELECTing a lot of data then ordering that data can cause an out of memory issue in my experience (or similar, can’t remeber the specific error message). If you normally need to run window functions over large datasets then you’re going to have to get pretty creative about it (A similar issue has cropped up in running certain types of window function, at least). Plus, you better have done a good job of deciding upon your partition key and getting your partition/sharding strategy right, because the in-built pseudo-partitions don’t always limit scans (and therefore don’t limit your bill). Also only Javascript UDFs are compatible at this point, which is annoying for some (I don’t really mind that tbh).
But I can’t stress enough how unbelievably impressive BQ is on performance - it’s a monster for performance in a good way. And the BQ UI and GCP ecosystem plays very nicely together - AWS have lot of products, Google have seamlessly integrated products. BQ supports querying external data sources (eg. CloudStorage) either as Native tables (implicit load to BQ, which is free if coming from CloudStorage), or External tables (can update live). And DataStudio is an amazing visualisation tool, which has its obvious advantages.
This is turning into an essay so I’ll leave the above stream-of-consciousness synopsis where it is. I think the above addresses your second question, but to quickly summarise the answer directly:
are there use cases where Redshift or SnowFlake is a better choice than BigQuery?
Yup, depending on your requirements, what data you’re putting in the database, and what trade-offs make sense for your business. At the end of the day every piece of tech involves some design decisions that make some trade-off, so you can only really compare them in the context of what you need to do/what your business looks like (in terms of architecture, skillsets and requirements).
I’m very curious to hear whether people disagree with any of the above or if people have their own perspectives though!
Best,
(Edits to rephrase poorly worded sentences and make my brain dump coherent
Further edits to correct some inaccuracies as pointed out by Mike below)