Setting up a Postgres test cluster in vagrant

No Comments

Most of the time its fine to test things out on your local machine and local installation of Postgres but often, you want more flexibility, the ability to quickly reset to a known starting point and the ability to try out different and more complex server architectures.

That is where Vagrant comes in. I use it to quickly set up Postgres clusters at different versions and different configurations.

You can find all of these in my repo at https://github.com/philmcc/postgres_clusters

For example here is a set up to allow you to set up a 3 node cluster with 1 leader and 2 followers all on postgres 9.4 and using replication slots:

https://github.com/philmcc/postgres_clusters/tree/master/postgres94/3_node_master_2_slaves_94_rep_slots

If you want to try it out, all you need to do (once you have Vagrant installed ) is:

Then if you want to reset the cluster, you just do a ‘vagrant destroy’ then vagrant up again and you have a clean starting point.

You could also try out the same config on Postgres 10 (beta2) which is the latest version right now using the config here:

https://github.com/philmcc/postgres_clusters/tree/master/postgres10/3_node_master_2_slaves_10_rep_slots

 

But what if the config you want isn’t there and you want to set up you own? Well fortunately its really easy to set up.

Lets take a look at the most basic configuration of 1 master and one read only slave using the new Postgres 10 beta release:

https://github.com/philmcc/postgres_clusters/tree/master/postgres10/2-node-master-slave_10

The first point is the Vagrant file. That tells Vagrant which nodes to configure, what networking they need and then the script name that will set up the software once the VM is up and running.

This is the only important part (that you need to know about):

That sets up 2 servers, one called master and one called slave. If you wanted to add another node in, all you would need to do is add something like the slave config section again e.g.

and put it before the final end. You just need to change the ip address, the hostname and the “scripts/slave.sh” script if you want to use a different script to set it up.

 

In the master dir, you have the postgresql.conf and the pg_hba.conf files for the master (the slave doesn’t need them because it will get them from the master when we start it up as a slave). That just allows replication connections in the pg_hba.conf and sets up a couple of parameters that differ from the default in the postgresql.conf file:

Most of the defaults i postgres are prety good so for this basic set up we can leave everything else as is.

 

So the only other part of the whole configuration is the provisioning scripts. There is one for the master and one or the slave node that runs once the VM’s have been set up. The order is Master VM – master provisioning script then slave VM – slave provisioning script.

The 2 scripts are very similar. They first download the yum repos and install the necessary software:

 

Next it creates some dirs and links then initializes a postgres cluster. Once that is done, postgres is shut down so that the conf files can be copied over and its started up again:

 

The slave.sh script is the same up to the point where it installs the software but the actual initializing of the database is unnecessarty as it is all handled by the pg_basebackup command that copieds the master database over and sets up replication in one handy command!

 

Finally there is some tidying up by creating a link to the data directory and also priming the recovery.conf file with a trigger file in case it needs to be easily promoted later.

 

That really is it.

Once vagrant up finishes, you can connect easily with ‘psql -U postgres -h 192.168.4.2’ to test it out and start playing.

Categories: PostgreSQL, Tutorial Tags: Tags: ,

Failover testing of a Postgres cluster

No Comments

Testing failover to one of 2 slaves and reattaching to the new master.

Starting config

Post failover config

 

Prerequisites:

You will need Vagrant and Virtualbox to run this tutorial but that should be all.

 

Process

Turn off master.
Promote slave1 to Master
Attach slave2 as follower to slave1
Attach master is as follower to slave1 (without rebuilding)

 

Steps:

 

Check the cluster stat and replication:

 

Stop postgres on the master:

 

Promote slave1

 

Connect to slave1:

From your local machine – check the status of slave1:

Check replication slots:

There are none there. Lets create some for our other hosts so that we dont lose ant wal files (you may not want to do this on a production system as it will force the master to keep ahold of wall files untill the new slaves can get them. It is necessary to attach the master straight back in though):

(The old master will still be called master so that we can see what has changed, but it wont be the leader in the new config)

 

Ok. so now we have a new master and the cluster is back up and running. Now its time to plug in the first follower (slave2) so that we have a standby for failover.

Connect to slave2 and stop postgres:

 

edit the recovery.conf file to point to the new leader – (slave1)
and set recovery_target_timeline = ‘latest’
Note the primary_slot_name = ‘slave2’ controls which replication slot this node will use.

Start postgres back up again:

Now check that the new follower is pointing at the leader and successfully getting data:

(on slave1 – the leader)

As you can see the ‘slave2’ replication slot is now active and connected.
You can further check this by creating something on slave1 and seeing it appear on slave2.

 

Now, its time to put the old master back into use. We don’t want to use it as the master agian because we already have a leader and switching would mean more downtime so we will put it in as another follower:

Doing that can be tricky because failing over could have put the cluster on a different timeline. This is easy to get around though by telling the old master to use the latest timeline.

Connect to the master node and create a recovery.conf file as above:

Start the instance back up and check that the replication is connected and recieveing changes:

As you can see Everything is connected and changeds are being passed on.

All thats left now is a bit of tydying up. The ‘master’ node still has replication slots configured and thess need to be removed.
Its simple enough to do: