High Availability Options in Oracle vs Postgres

Oracle is the database to beat in terms performance and features or at least is positioned that way. More importantly, if you are thinking of migrating from Oracle to Postgres to save money, you need to know that your new database has at least the same features at the one that you are moving from.

High availability is one of the most important concepts and features for a database system. For most enterprise level applications, downtime has a direct financial cost and the actual loss of some or all of your data would be catastrophic.
You need to know that the system that you are moving to can protect your data as well as the system that you are on at the moment.

With that in mind, I am going to look at the high availability approaches and features of bot Oracle and Postgres. The goal is not to work out which is better where they offer the same features but rather to identify if there re any short comings or features missing ion either database system.

To start off, we need look at what High Availability means. Technopedia defines it as: https://www.techopedia.com/definition/1021/high-availability-ha

High availability refers to systems that are durable and likely to operate continuously without failure for a long time. The term implies that parts of a system have been fully tested and, in many cases, that there are accommodations for failure in the form of redundant components.

So, we are looking at a systems ability to withstand faults and recover. To be clear, there may be some downtime when something serious like database server failure happens but the downtime should be assort as possible and recovery should be predictable and trivial.

Oracle has extensive coverage of High Availability in its documentation. https://docs.oracle.com/cd/B28359_01/server.111/b28281/architectures.htm#CIHDEAGI
They break it down into a large number of possible architectures but the basic products are:
Single Server Clusterware Real Application Clusters (RAC) Data Guard Oracle Streams

Single Server

This is where you have Oracle installed on one server. I don’t consider this a high availability deployment but it is listed on the oracle docs and to be fair, with the options listed it does meet some of the requirements listed in the above definition.
Options in this scenario (and all subsequent ones) are:

Fast Start fault recovery – When a fault is detected, the Oracle database will attempt start up to the last known good point and it will do this automatically.

Automatic Storage Management (ASM) – Oracle will take a disk or set of disks as raw devices and manage them for you. It does this in a similar way to a RAID array so that data is duplicated and if it becomes corrupted in one location it can be recovered.

Recovery Manager (RMAN) – Manages backups and recovery. It provides a point in time recovery solution so that you can recover right up to when a failure happens.
There are ore features listed but these are the main ones.

These are all pretty basic and reproduced in most DBMS. Postgres will try to recover from a failure using the WAL files stored and in fact, you can restore from a base_backup plus your stored WAL files yourself. For more features and an interface similar to RMAN, you could use the BARMAN backup and recovery tool from 2ndQuadrant.

ASM is a good. Feature but it is basically made necessary by a good RAID controller or by running your system on a SAN that managed data redundancy itself. There are other advantages to this such as the easy of scaling storage and dealing with disk failures etc.

Oracle Clusterware

Oracle Clusterware slows to you to perform a failover to a new instance if the primary instance fails. It requires a virtual IP which is reassigned to the new node when the old one fails.

This method does not replicate the data so it relies on both nodes having access to the underlying storage mechanism.

When the new node comes up, it gets the storage mounted and the ip’s pointed to it and the cluster is up and running.

Obviously, this provides no protection if the underlying data storage fails.
It is certainly possible to implement this with Postgres using third party tool. Given that replication technology does not cost you any more with Postgres though, it would almost always be a better option to use something like streaming replication to maintain a full standby server to failover to if the primary fails. This is actually the standard approach for high availability with Postgres and is very easy to set up. It also has the advantage of giving you an extra server to use for reporting or any read queries that you would like to perform.

Real Application Clusters

Oracle RAC is an excellent technology which is hard to reproduce. It is a multi master system and was one of the first multi master database solutions. This system gives excellent high availability because when a node fails, there is at least one other node already up and taking traffic. All you need to do is reroute connections. You are going to lose transactions that were in progress but your application should be able to handle that.

Until fairly recently there was not really a good solution for Postgres to do multi master replication. Today there are several options.

There is an extension called Bi Directional Replication (BDR) from 2ndQuadrant. Version 2 which is not now being worked on is free to use but they have closed source the new version which has a lot of edition features.

BDR3 is a paid for product but is a small fraction of the price of RAC and gives you all of the same features as well as extra ones to do with data partitioning. https://www.2ndquadrant.com/en/resources/postgres-bdr-2ndquadrant/

That option also gets you support with comparable SLA’s to Oracle.
Another option is through Enterprise DB. Postgres Plus has multi master replication through their replication server. Again, it is a paid for product, but gives you the same features as Oracle RAC at a fraction of the cost. https://www.enterprisedb.com/enterprise-postgres/edb-postgres-replication-server

There are other options available from other vendors with slightly different features. There are a lot of options for you to find the product that best matches your needs.
This is the way it is with Postgres a lot of the time – there is no one size fits all solution, you have options and you need know what you want to achieve and pick the best option based on that.

Oracle RAC with Extended Clusters

This is just RAC where the noses are spot over more than one data center. Its an excellent idea for providing even greater HA and covering the scenario where a datacenter or access to that data center goes down
This is easily matched by Postgres. Streaming replication can easily handle cross datacenter latency to provide a shadow cluster although of course this would not match RAC for the multi Master part.
BDR3 deals with a cross datacenter multi master solution out of the box and would probably me the closest match to this.

Oracle Data Guard.

Oracle Data Guard provides one or more standby databases as well as the ability to fail over to them when an outage on the primary database is detected.

The data can be replicated to the standby severs synchronously or asynchronously and the standby servers can be used for read queries if needed.

There are several configurable options such as logical standby severs and snapshot standbys which will receive redo but not apply it so that you can use that server for testing at a given point.

All of these features except the automatic failover are provided by Postgres’ in built replication technology. Whereas Data Guard is a paid for extra, in Postgres its all free and very easy to set up.

You can even set up pools of standby severs that can automatically dropout if they fall behind using tools like HAproxy.

The automatic failover feature is not included in Postgres by default but ihis is in large part because there are plenty of well established and well tested third party solutions.

One such solution is Patroni: https://github.com/zalando/patroni It is free and can run on a wide variety of platforms. It handles the cluster management and the joining and parting off nodes.

Oracle Streams

Streams is the last option that I will look at here. It basically allows you to stream changes to a replica. This increases the options over a standby replica because it is capable being written to to having different schema from the master.
This can be compared to logical replication in Postgres – a feature that is now built in to Postgres core and very easy to set up.

There are lots of other options listed in the Oracle docs but they are all variations on these technologies ranges in different architectures. Postgres provides options to get eh same results as any of them. Most of the options are free and those that re not are significantly cheaper than in Oracle.