On-Premise PostgreSQL storage. Still requires S3?

Hello @dbh,

Few moments here.

  1. There’s no way right now you can put enriched data from S3 into any relational database. In vanilla batch pipeline we have additional Shred step that prepares enriched data for loading into Redshift and Postgres.
  2. Even with Shred step, Postgres right now lacks support of self-describing JSON - it loads only atomic.events table, which is most likely less than you want.
  3. S3 right now is hardcoded into RDB Loader, so it simply doesn’t know how to fetch data from other sources. This is obviously not going to remain in this state forever - we’re planning to add new cloud providers and storage targets, which inevitable will also open opportunities for on-premise solutions. But considering previous points this one is least of our problems.

All above make Postgres load with any object storage apart from S3 hardly feasible right now. But still we saw many efforts (1, 2, 3) on this forum to build on-premise pipeline using Kafka. I believe people usually end up with Kafka JDBC Connect, which is less persistent than object storage, but looking very promising.

Hope that helps.

1 Like