What is the pg_clog and the clog

There are several directories named log in a Postgres installation.

You have pg_xlog, pg_log and pg_clog.

These are all important but I’ll talk about the others another time.

Pg_clog is the commit log. It is generally a small folder that you should never have a reason to look at. (note that from version 10 of postgres the pg_clog directory is being renamed to pg_xact I will continue to refer to it as pg_clog in this document but the functioning of both is the same)

Importantly, you can never delete anything from that directory. If you do your database will become unusable and you will need to recreate it from a backup.

Any backups must include the pg_clog directory as it is required for operation

The pg_clog directory is the on disk record of the commit log which is held in shared memory.

In memory, the clog consists of 8k pages containing an array. The Array holds the transaction id and a status for each one. The possible  states for a transaction in postgres are in_progress, committed, aborted and sub_committed. The meaning of all of them are pretty obvious apart from sub_committed which only applies to sub transactions.

New transactions are added to the end of the array and when the 8k page fills up a new one is added.

Postgres can access the array to determine the status of a given transaction.

Postgres writes the clog to disk in the pg_clog directory whenever postgres is shut down or a checkpoint is issued. Each file in that directory can be a maximum of 256kb so if the current size of the clog in memory is greater than 256kb then multiple files will be used. (if less is used then postgres still uses one file and moves on to the next one the next time.)

Data in the pg_clog is only added to and so it continuously grows (all be it not very fast relatively). Not all of the data in there is required though (only data on transactions that postgres may need information about ), so the vacuum process clears out the unneeded data.

Leave a Comment