Postgres Vacuum and AutoVacuum.

Basics of vacuum Postgres maintains multi version consistency by keeping old versions of changes tuples instead of actually deleting them. Eventually, keeping all of those out of date versions becomes big burden in terms  of storage and performance. Eventually, you end up with bloated tableland indexes. If not felt with, eventually they would fill up tour disks but they would probably make the database unusable before then so we have a handy process to clean it all up. That is Vacuum. Postgres Vacuum goes through your tables and indexes an cleans out had tuples – that is tuples that can no longer be needed by a transaction.

Vacuum can be ran with a number of parameters.

Vacuum  – with no parameter will run on every table that the user has access to. Vacuum Analyze – runs a vacuum and then runs an analyst on the table to update the state for the optimizer.

Vacuum on its own will clear out the dead tuples but because of the way a table is written to disk, the free space will not be available to the operating system or other tables. Only the table that already owned the space will be able to resume it. The other option is to run vacuum FULL. This locks the table and rewrites the table to a new file so that no unnecessary space is used. This frees the space for other objects to use. You have to be careful with vacuum full  as locking the table means that nothing else can access it. This could easily cause an outage.

Why do you need to vacuum your Postgres database?

As mentioned above, every update cause s a new row to be written rather than changing the actual data on disk and deleting does not actually remove rows. So over time your tables and indexes become booted. This uses unnecessary disk space but also harms performance as there is more data to be sifted through for each query.

The vacuum process is so important to the Postgres database that you shoudn’tleave it to manual intervention. Postgres has the AutoVacuum process to take care of it most of the time:

The Autovacuum Damon works in the background to vacuum the databases on a server. There are several processes. The Autovacuum Launcher launches auto vacuum processes. It will spin up a new vacuum process once  every autovacuum_naptime seconds. That means that if you have more than one database in a cluster, it will only get vacuumed once every  autovacuum_naptime / num dbs seconds.

When an auto vacuum worker is spawned, it will check each table in its database. The autovacuum_max_workers parameter sets the maximum number of workers that can be running at one time. This means that if there are several large tables, all the workers can get caught up vacuuming those tables.

Backing up and Restoring Postgres using pg_basebackup

There are several great tools available that handle backing up and managing the backups of your PostgresQL database. It is really important to understand the underlying process that these tools use though as well as the standard postgres commands are that you would need to run in case you ever need to do it manually.

There are 2 types of backups that you can take in Posgres, logical and physical. Logical backups are in the form if the SQL statements necessary to recreate the database (not necessarily in a human readable form). The 2 tools to take logical backups are pg_dump and pg_dumpall.

Continue reading “Backing up and Restoring Postgres using pg_basebackup”

What is the pg_clog and the clog

There are several directories named log in a Postgres installation.

You have pg_xlog, pg_log and pg_clog.

These are all important but I’ll talk about the others another time.

Pg_clog is the commit log. It is generally a small folder that you should never have a reason to look at. (note that from version 10 of postgres the pg_clog directory is being renamed to pg_xact I will continue to refer to it as pg_clog in this document but the functioning of both is the same)

Importantly, you can never delete anything from that directory. If you do your database will become unusable and you will need to recreate it from a backup.

Continue reading “What is the pg_clog and the clog”

Setting up a Postgres test cluster in vagrant

Most of the time its fine to test things out on your local machine and local installation of Postgres but often, you want more flexibility, the ability to quickly reset to a known starting point and the ability to try out different and more complex server architectures.

That is where Vagrant comes in. I use it to quickly set up Postgres clusters at different versions and different configurations.

You can find all of these in my repo at

For example here is a set up to allow you to set up a 3 node cluster with 1 leader and 2 followers all on postgres 9.4 and using replication slots:

Continue reading “Setting up a Postgres test cluster in vagrant”