Log shipping?!

PostgreSQL offers support for "shipping" it's WAL, the Write Ahead Log, where the changes of every transaction are recorded, to other database systems. The other database system then reads the changes from the WAL file and applies the changes to it's local data store.
Log shipping has the drawback that the slave servers can't be used for queries as long as they are replicating data and cannot be put back in replication after they've been taken online. Additionally the replication isn't very granular, PostgreSQL natively itself will accept only completed WAL files.
On the other hand this mechanism is very efficient and very reliable since the WAL is at the core of normal PostgreSQL operation.

The WAL

The WAL files of a PostgreSQL database can be found under $PGDATA/pg_xlog, in Debian $PGDATA is usually /var/lib/postgresql/<VERSION>/<CLUSTERNAME>. Every WAL segment is 16MiB in size (compile-time default) and it's name consists of three separate counters:

Naming

If we take the name "00000001000000030000008E" it tells us that the timeline of the file is "1", that it belongs to the logical log file (logid) "3" and that it's the 142th (0x8E) segment of the given logfile.
The segment counter increments with every segment switch, the logical log file is incremented (and the segment counter reset to 0) whenever a new segment would overflow the 32bit address space (or "4GiB") of a logical logfile. With a standard segment size of 16MiB this happens every 255 segments.

Switching segments

A WAL segment gets switched when one of the following things happen:

it's full (16MiB worth of changes have been written)
archive_timeout is exceeded
pg_switch_xlog is called

Replicating

The mechanism used for reading in WAL files on a slave server is very close to the mechanism that is used when PostgreSQL recovers from an unclean shutdown:
The daemon doesn't know in what state the heap files (tables, indexes, etc.) are and therefore consults the WAL, where changes of every transaction are written to, replaying every transaction since the last CHECKPOINT.
Because the same code-infrastructure is used, the replaying of WAL files is called "recovery mode".

Shipping the files

PostgreSQL has an archive_command parameter which can be used to configure a command which gets called after every segment switch. This makes it easy to copy completed WAL segments from the master server to a remote system with various mechanisms, e.g. nfs, scp, rsync, etc.

Recovering

To configure a server for recovery you need to place a file named "recovery.conf" into it's $PGDATA directory. A sample recovery.conf might look something like this:

restore_command = '/usr/lib/pgsql/9.2/bin/pg_standby -l -t /var/lib/pgsql/logshop.trigger /srv/logship-archive %f %p'
log_restartpoints = 'true'
# For PITR
#recovery_target_time = '2013-12-21 20:00:00'

Preparing the environments should be rather easy, first make sure, that your machines are setup correctly.

When both machines are running, run the following command:

Database Administration

Pages

Monday, December 23, 2013

PostgreSQL replication solutions: Log shipping with pg_standby

Log shipping?!

The WAL

Naming

Switching segments

Replicating

Shipping the files

Recovering

No comments:

Post a Comment

Translate Language

Learn Database Guide