Philip McClarence

Can you improve your Oracle database using Postgres?

25/02/202007/01/2020 by Philip

Oracle is a great database. It is cutting edge and it has a huge team of developers behind it as well as massive funding.

There are not any areas where it lacks anything major that exists in other comparable databases.

The problem with Oracle is both that it is expensive in the first place, but also that all of the extras are chargeable and also expensive!

High Availability Options in Oracle vs Postgres

03/01/2020 by Philip

Oracle is the database to beat in terms performance and features or at least is positioned that way. More importantly, if you are thinking of migrating from Oracle to Postgres to save money, you need to know that your new database has at least the same features at the one that you are moving from.

High availability is one of the most important concepts and features for a database system. For most enterprise level applications, downtime has a direct financial cost and the actual loss of some or all of your data would be catastrophic.
You need to know that the system that you are moving to can protect your data as well as the system that you are on at the moment.

Should you migrate to Postgres from Oracle?

03/01/2020 by Philip

The Oracle database has been the gold standard for enterprise applications for a long time now. It has great performance, solid reliability and most of the features that you could want are available. The big problem is that it is expensive. And I mean REALLY expensive. That’s just for the base product as well. All of the extra features that you might want are chargeable extras which means that wench developing for Oracle, you often have to work without some of the more advanced features because they would cost too much.

Postgres Vacuum and AutoVacuum.

03/01/202002/09/2019 by Philip

Basics of vacuum Postgres maintains multi version consistency by keeping old versions of changes tuples instead of actually deleting them. Eventually, keeping all of those out of date versions becomes big burden in terms of storage and performance. Eventually, you end up with bloated tableland indexes. If not felt with, eventually they would fill up tour disks but they would probably make the database unusable before then so we have a handy process to clean it all up. That is Vacuum. Postgres Vacuum goes through your tables and indexes an cleans out had tuples – that is tuples that can no longer be needed by a transaction.

Backing up and Restoring Postgres using pg_basebackup

02/09/201930/08/2019 by Philip

There are several great tools available that handle backing up and managing the backups of your PostgresQL database. It is really important to understand the underlying process that these tools use though as well as the standard postgres commands are that you would need to run in case you ever need to do it manually.

There are 2 types of backups that you can take in Posgres, logical and physical. Logical backups are in the form if the SQL statements necessary to recreate the database (not necessarily in a human readable form). The 2 tools to take logical backups are pg_dump and pg_dumpall.

What is the pg_clog and the clog

30/08/201929/08/2019 by Philip

There are several directories named log in a Postgres installation.

You have pg_xlog, pg_log and pg_clog.

These are all important but I’ll talk about the others another time.

Pg_clog is the commit log. It is generally a small folder that you should never have a reason to look at. (note that from version 10 of postgres the pg_clog directory is being renamed to pg_xact I will continue to refer to it as pg_clog in this document but the functioning of both is the same)

Importantly, you can never delete anything from that directory. If you do your database will become unusable and you will need to recreate it from a backup.

Setting up a Postgres test cluster in vagrant

30/08/201921/07/2017 by Philip

Most of the time its fine to test things out on your local machine and local installation of Postgres but often, you want more flexibility, the ability to quickly reset to a known starting point and the ability to try out different and more complex server architectures.

That is where Vagrant comes in. I use it to quickly set up Postgres clusters at different versions and different configurations.

You can find all of these in my repo at https://github.com/philmcc/postgres_clusters

For example here is a set up to allow you to set up a 3 node cluster with 1 leader and 2 followers all on postgres 9.4 and using replication slots:

Failover testing of a Postgres cluster

30/08/201920/07/2017 by Philip

Testing failover to one of 2 slaves and reattaching to the new master.

Starting config

   master
  |      |
slave1 slave2

Post failover config

   slave1
  |      |
slave2 master

What is apache Cassandra?

30/08/201914/04/2016 by Philip

Cassandra is a fast distributed database.
It has several defining features:

Built in high availability. – Any node can handle read and write requests and your data is replicated to x nodes so regardless of which node (or even a data center) goes down, you will still have access to read and write your data.
Linear Scalability. – Doubling the number of (identical) nodes should double the write performance. Its basically as simple as that was all nodes can handle all operations and there is no central control.

Predictable performance. (i.e. doubling the number of identical nodes should double the write throughput)
no single point of failure. -nodes can go down and come back up without the front end application becoming aware of it.
Multiple Data Centres catered for and taken advantage of out as standard.

Built to run on commodity hardware – so you can run it on lots of $1000 servers rather than 1 or 2 $100000 servers.
Easy to manage operationally. – The system is designed to need very little ops input.

Relational Databases and Big Data workloads.

13/04/2016 by Philip

This intro to Cassandra is taken from the DataStax course. I don’t necessarily agree with everything – particularly their take on what a traditional RDBMS can and can’t do but I have included their view here for completeness.

Cassandra is designed for ‘Big Data’ workloads. Im order to understand the characteristics of Big Data, lets first define ‘Small Data’:

This would typically be a volume of storage that would fit on 1 machine and a RDBMS is typically fine and able to handle the number of operations and the quantity of data. The system will support a number of concurrent users in the hundreds. It fully supports ACID.

When you want to scale such a system, you are going to do it vertically first – with a bigger host, more RAM or processors.

Can Relational databases support big data?