Backing up and Restoring Postgres using pg_basebackup

There are several great tools available that handle backing up and managing the backups of your PostgresQL database. It is really important to understand the underlying process that these tools use though as well as the standard postgres commands are that you would need to run in case you ever need to do it manually.

There are 2 types of backups that you can take in Posgres, logical and physical. Logical backups are in the form if the SQL statements necessary to recreate the database (not necessarily in a human readable form). The 2 tools to take logical backups are pg_dump and pg_dumpall.

Continue reading “Backing up and Restoring Postgres using pg_basebackup”

What is the pg_clog and the clog

There are several directories named log in a Postgres installation.

You have pg_xlog, pg_log and pg_clog.

These are all important but I’ll talk about the others another time.

Pg_clog is the commit log. It is generally a small folder that you should never have a reason to look at. (note that from version 10 of postgres the pg_clog directory is being renamed to pg_xact I will continue to refer to it as pg_clog in this document but the functioning of both is the same)

Importantly, you can never delete anything from that directory. If you do your database will become unusable and you will need to recreate it from a backup.

Continue reading “What is the pg_clog and the clog”

Setting up a Postgres test cluster in vagrant

Most of the time its fine to test things out on your local machine and local installation of Postgres but often, you want more flexibility, the ability to quickly reset to a known starting point and the ability to try out different and more complex server architectures.

That is where Vagrant comes in. I use it to quickly set up Postgres clusters at different versions and different configurations.

You can find all of these in my repo at

For example here is a set up to allow you to set up a 3 node cluster with 1 leader and 2 followers all on postgres 9.4 and using replication slots:

Continue reading “Setting up a Postgres test cluster in vagrant”

What is apache Cassandra?

Cassandra is a fast distributed database.
It has several defining features:

  • Built in high availability. – Any node can handle read and write requests and your data is replicated to x nodes so regardless of which node (or even a data center) goes down, you will still have access to read and write your data.
  • Linear Scalability. – Doubling the number of (identical) nodes should double the write performance. Its basically as simple as that was all nodes can handle all operations and there is no central control.
  • Predictable performance. (i.e. doubling the number of identical nodes should double the write throughput)
  • no single point of failure. -nodes can go down and come back up without the front end application becoming aware of it.
  • Multiple Data Centres catered for and taken advantage of out as standard.
  • Built to run on commodity hardware – so you can run it on lots of $1000 servers rather than 1 or 2 $100000 servers.
  • Easy to manage operationally. – The system is designed to need very little ops input.

Continue reading “What is apache Cassandra?”