We’ve had Snowplow set up for a little while collecting events, it was setup following the AWS open source tutorial. It seems like recently it started to add a huge amount of data into the database (RDS PostgreSQL). We increased the storage capacity and it started using it all up. Where is the best place to figure out what is happening? You can see below when this started around March 14th.
Hi @Ryan_Jansen have you started doing any datamodeling of the data in Postgres / have you started tracking substantially more data into the pipeline? Did you introduce new tracking / make any changes around that date that could be contributing?
I would be starting with tracking down which schemas / tables in the database are consuming the bulk of the space in your RDS and figuring out what the source of it filling up is - if you are consuming this much disk space either you have increased traffic volume substantially or you might have some large output products from datamodeling processes.
Once you know the source of whats filling up the RDS exactly then you can start to figure out what the issue might be and go from there.
Thanks for getting back @josh! I was able to find the issue. We had logical replication on and the WAL was filling up because the intermediate SSH tunnel had changed it’s internal IP and could not connect to the database.
I appreciate you helping me debug!
Now I need to look into how to backfill missing data.