Snowplow Pipeline with Presto and Minio


I am curious about anyone who run snowplow pipeline with presto and minio rather than athena/s3, bigquery couple. I’d love to hear experiences about it.

I don’t know of anyone doing this at the moment but it’s certainly possible.

You would need to modify some of the components that currently sink data to S3 to sink data to an alternate S3 compatible MinIO endpoint but that shouldn’t be particularly complex.

The question I’d be asking is why though - as running MinIO + Presto in order to achieve comparable performance to S3/Athena will be orders of magnitude more expensive and you’d still need to worry about redundancy, maintainability, upgrades etc.

Hello Mike,

I am just trying to hear different technologies because I am middle of the structuring project. I thought that using open-source technologies might reduce cost but as I read your comment that is nonsense…

Thanks for your input :slight_smile:

No worries - I think at a certain scale / volume it’s entirely possible this may well work out cheaper, but I think it’s a pretty significant volume (or a specific use case where a large number of frequent, expensive operations are being performed).

For example - in a blog post from Minio where they use Presto to achieve comparable performance with Athena they use 8 x c5n.16xlarge instances for Presto - which equates to about ~$40 / hour USD of running costs + ~$140 / hour USD for the Minio server (they are provisioning a lot of storage here though). Assuming that you were running this on-demand at 24 hours / day that’d be just over $4k / day which would be cheaper with reservations but not enough to put you in comparable territory with Athena / S3.

1 Like