Monday, September 23, 2013

A PostgreSQL 9.2 Streaming Replication Primer


Posted on 04 Dec 2012 under code
Whether you’re growing tired of your mySQL install, or you’re just starting off, PostgreSQL 9.2 is a great time to dive into this extremely exciting project. Many great changes have been made in 9.2, including several replication improvements, which I will go over in a fairly thorough nature for this article.
When initially investigating replication, I really liked streaming replication, with its ability to cascade slaves, have hot standbys, and online backups, it provided the base for a highly available and efficient database cluster.
I will make the assumption that you’re building this on an Ubuntu box. I used Ubuntu Pengolin.
For your convenience, a table of contents:
So, let’s go.

Initial Master Setup

First, you’ll need to install PostgreSQL 9.2 packages. By default, these don’t exist on Ubuntu Pengolin, but we can use official packages available at apt.postgresql.org. Create a new file with whatever name you want under /etc/apt/sources.list.d/ and add the following inside:
deb http://apt.postgresql.org/pub/repos/apt/ precise-pgdg main
Then you’ll need to fetch the apt-key, and finally, do an update and install.
$ wget --quiet -O - http://apt.postgresql.org/pub/repos/apt/ACCC4CF8.asc | sudo apt-key add -
$ sudo aptitude update
$ sudo aptitude install postgresql-9.2 postgresql-contrib-9.2
Once installed, drop the cluster and get explicit with the locale you want. I found the default (en_CA.utf8 for us) was not what I wanted.
$ sudo pg_dropcluster --stop 9.2 main
$ sudo pg_createcluster --locale=en_US.utf8 --start 9.2 -d /var/lib/postgresql/9.2/main main
Next, we’ll want to create some database users.
$ sudo -u postgres psql -d template1 -c "ALTER USER postgres WITH ENCRYPTED PASSWORD 'replaceme';"
$ sudo -u postgres psql -c "CREATE USER pg_myapp WITH ENCRYPTED PASSWORD 'replaceme';"
$ sudo -u postgres psql -c "CREATE USER replicator REPLICATION LOGIN ENCRYPTED PASSWORD 'replaceme';"
Fairly straightforward. pg_myapp can be a user with a name relevant to your project. The replicator user will be what we use to do replication. Just allows for some more specific permission setting later on. You should only do this on master, as we’ll copy it down to a slave eventually.

Configuration on Master

We’re going to setup the master so that it ships enough information to its slave allowing it to act as a hot standby. Additionally, we’ll archive WAL files to the slave incase we need to point in time recovery, and it helps streaming replication catch up!
Set these within postgresql.conf
listen_addresses = '*'
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32

archive_mode = on
archive_command = 'rsync -aq %p postgres@slave1.example.internal:/var/lib/postgresql/9.2/archive/%f'
archive_timeout = 3600
A wildcard on listen_addresses? In most cases, this should be fine as we'll use pg_hba.conf for security. But, be aware of what you're doing!
I’m not going to get into huge detail as to what these mean, as the official documentation explains it really well. I recommend it.
Our archive command is going to assume that the archive directory exists on our slave. We’ll create it later.
You won’t believe me, but we’re pretty close to being done with master. You’ll want to setup some permissions within pg_hba.conf. I added the following lines, to allow local replication (via pg_basebackup), slave replication, and app connectivity.
host    all     pg_myapp 127.0.0.0/24  md5
host    replication replicator  127.0.0.0/24    trust
local   replication postgres trust
Replace 127.0.0.0 with the respective subnet of your master/slave machines or simply use explicit ip addresses, depends on your setup.
Restart PostgreSQL on this master machine and we can move onto our standby server.
$ sudo service postgresql restart

Setting Up An Initial Slave

I’m going to make the assumption that this second slave is on a different box, so go ahead and install the same Ubuntu packages as you did for the master. Initially, we’ll set this slave up as it were the only slave. Further into the article, I’ll go into detail on adding the second slave.
Now again, just as with the master, you’ll need to modify postgresql.conf. The configuration will be nearly identical. We want the slave to match the master as closely as possible, so that in the event of a failover, it can easily become the new master.
listen_addresses = '*'
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32
hot_standby = on
The difference is hot_standby = on, which simply states that this machine, which is capable of WAL shipping as well, is a ready to replace master at any time.
You’ll also want to give this slave the same pg_hba.conf permissions as we did the master. Here they are again:
host    all     pg_myapp 127.0.0.0/24  md5
host    replication replicator  127.0.0.0/24    trust
local   replication postgres trust
Finally, a new, very important file is introduced, recovery.conf. It is necessary in order to replicate from master. Place this in /etc/postgresql/9.2/main/ and create it like so:
standby_mode = 'on' # enables stand-by (readonly) mode

# Connect to the master postgres server using the replicator user we created.
primary_conninfo = 'host=pgmaster.mysite.internal port=5432 user=replicator'

# Specifies a trigger file whose presence should cause streaming replication to
# end (i.e., failover).
trigger_file = '/tmp/pg_failover_trigger'

# Shell command to execute an archived segment of WAL file series. 
# Required for archive recovery if streaming replication falls behind too far.
restore_command = 'cp /var/lib/postgresql/9.2/archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /var/lib/postgresql/9.2/archive/ %r'
This is a lot to take in, but it’s fairly straight forward. Key things include:
  • Replace the values in primary_conninfo with relevant host names to your master. Make sure this machine can connect to the address you use.
  • Ensure the archive directory under /var/lib/postgresql/9.2/archive/ is created. This is a choice we make, so it’s not created by Postgresql
Save that file. We’ll actually use it later. PostgreSQL looks for recovery.conf within /var/lib/postgresql/9.2/main/, rather than where we placed it. But before we can move it there, we need to setup the essentials for replication on the slave.

Starting Replication On The Slave

Now that the slave is configured, it’s necessary to get the contents of master onto it, so that replication can sync up. This is done using a handy command namedpg_basebackup, which essentially copies the entire data directory of your cluster into a location of your choice. It is also what we know as an online backup, as the master can stay up while the command copies all the data.
To start, you should stop postgresql on your slave. We don’t want any data flowing while we do this.
$ sudo service postgresql stop
Then, switch to the master server and execute pg_basebackup
# on master machine
$ pg_basebackup -U postgres -D - -P -Ft | bzip2 > /var/tmp/pg_basebackup.tar.bz2
$ scp /var/tmp/pg_basebackup.tar.bz2 postgres@pgslave.mysite.internal:/var/tmp/
I love pg_basebackup because it allows us to pipe in any way we want. bzip2 is always a nice option, but it’s your choice. After backing up to a temporary correct, I send the backup straight to a temporary directory on the slave.
Now, back on the slave machine, let’s wipe the our clusters data directory before importing our backup:
# on slave machine
$ cd /var/lib/postgresql/9.2/main/
$ rm -rf *
$ tar -xjvf /var/tmp/pg_basebackup.tar.bz2
Make sure that the contents of this directory are owned by the postgres user and group. If not, then:
$ sudo chown -R postgres:postgres /var/lib/postgresql/9.2/main
Now, copy the recovery.conf file we made earlier into the data directory.
$ sudo -u postgres cp /etc/postgresql/9.2/main/recovery.conf \
    /var/lib/postgresql/9.2/main/recovery.conf
Finally, start the slave up again. If all went well, you should receive no errors:
$ sudo service postgresql start
This is a good time to tail the cluster log to see if all is swell.
$ tail -f /var/log/postgresql/postgresql-9.2-main.log
You should see a couple of lines similar to:
LOG:  entering standby mode
LOG:  streaming replication successfully connected to primary
Excellent! You now have a master/slave setup with streaming replication. In essence, what happened was pretty simple. We took the entire data directory of the master and gave it to the slave. Within this directory, there are included log files that help the master and slave communicate the point in time they are at.
So how can we test this? Well, assuming the log file said everything is ok, let’s try some queries!
# on master
$ sudo -u postgres psql -c "CREATE TABLE fake_customers (name VARCHAR(32));"
Now, go onto the slave. This table should exist.
$ sudo -u postgres psql -c "SELECT name FROM fake_customers;"
Congratulations! You setup a master/slave replication. Wasn’t it simple? Now, let’s move onto keeping an eye on this replication, and eventually adding a second slave (totally optional, of course.)

Checking For Replication Issues

We definitely need some way to monitor our postgres instances to ensure they’re actually doing what we expect. There are a variety of solutions out there, including aPostgreSQL Munin plugin but I’d like to touch a bit on the XLOG location and how to find the offset between your servers, eventually reporting it to some backend.
XLOG records are generated on the master and shipped to the slaves. By gathering data from the master and slave, we can compare some differences and get an idea of how far behind our slaves are in terms of replication.
First thing your script would need to do is get the XLOG Location of the master.
# run on master.
$ psql -c "SELECT pg_xlog_location_diff(pg_current_xlog_location(), '0/0') AS offset;"
Store this offset into a variable and then get the receive and replay locations from each slave.
# on each slave
$ psql -c "SELECT pg_xlog_location_diff(pg_last_xlog_receive_location(), '0/0') AS receive, \
                pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/0') AS replay;"
Then, a simple subtraction:
receive_delta = master_offset - slave_receive_offset
replay_delta = master_offset - slave_replay_offset
Finally, you’ll have a number that looks very arbitrary. Great?
The number returned is essentially the offset in bytes between the master and slave(s). The difference will vary between various applications, so in the end, it comes down to knowing your applications environment, your configuration, and keeping an eye on the consistent result of the difference. Over some time, you’ll be able to figure out what a “critical” difference is for your application.
If you’d like a fully functional script that does this check, I posted it on Github

Adding A Second Slave

We’re now in a situation where we have a master and slave, with streaming replication as well as WAL shipping. At this point, you can be pretty happy. You have a slave failover machine that can be ready to become your master machine at the touch of a trigger (surprisingly, almost quite literally).
However, let’s say we want to add another slave. Initially, I imagined that simply adding a second slave, setting up an NFS mount that the master ships to, and those slaves recover from would be pretty ideal, but I found a few issues with that:
  • NFS mounts, at least for us, were notorious for being unreliable in a high-availability situation.
  • We use pg_archivecleanup on each slave to clean up the WAL logs, but when if an NFS mount is setup and you’re sharing that, when do you know you can actually clean up that file? What if the other slave needs it?
With mostly the latter in mind, I came to the conclusion that the ideal choice Cascading Replication. The ability to do this was introduced in PostgreSQL 9.2 and it makes adding additional slaves really simple.
First, how would this setup look? repl. stands for replicate
         +-----------+        +-----------+        +-----------+
         |  master   |        |   slave   |        |   slave   |
         |-----------|        |-----------|        |-----------|
         |           |        | (primary) |        |           |
         |           | repl.  |           | repl.  |           |
         |           +-------->           +-------->           |
         |           |        |           |        |           |
         |           |        |           |        |           |
         |           |        |           |        |           |
         +-----------+        +-----------+        +-----------+
Simply put, our first slave is more of a primary slave, that sends information to other slaves. The second slave does not directly talk to the master. The two initial issues we faced are now gone, and we have an even more powerful setup. The changes to your existing slave are minimal.
First, we need to tell the primary slave to archive to the secondary slaves. Go back to the first slave machine and edit postgresql.conf.
archive_command = 'rsync -aq %p postgres@slave2.example.internal:/var/lib/postgresql/9.2/archive/%f'
archive_timeout = 3600
Now, onto the second slave. Provision the box as you did the initial slave, setting up the postgresql.confpg_hba.conf, and recovery.conf files as we did, with one one minor adjustment in recovery.conf
# Connect to the primary slave postgres user using the replicator user.
primary_conninfo = 'host=pgslave1.mysite.internal port=5432 user=replicator'
Instead of connecting to the master for replication, this machine will connect to the first slave.
Now, run pg_basebackup just like you did on the primary slave on this machine, acting as if the primary slave was its master. Move recovery.conf into the/var/lib/postgresql/9.2/main directory and start postgres up.
You should see your secondary slave has connected to your primary slave within the logs. With a second slave, you now have greater recovery flexibility and a great server to run backups from, checks, etc.

That’s A Wrap

And that’s it. Try running some queries that alter data on your master and watch them flow to your primary slave and down to your secondary. Exciting stuff.
Before going into production, you can try triggering a failure of the master. Play with the trigger_file you defined and see the results. There are some interesting quirks that come up when you do these failovers, many which are being addressed in the next version of PostgreSQL.
I hope you found this article useful and you can use it as a reference when you’re setting up your own instance. I’ll leave you with some resources for additional quality reading.

Resources

Others Posts You May Enjoy

No comments:

Post a Comment