Folks,
Wanted to give you the below testing emails from DHAVAL JAISWAL. He's
been testing 9.3's streaming-only cascading replication, and so far it
works as advertised. What he found in his tests was:
a) he could not remaster to a former replica which was behind the relica
he was trying to remaster
b) when servers where correctly caught up, remastering worked correctly
So, all good so far.
Text follows
======================
TEST 1: remastering failure due to picking the wrong replica
I have tested below scenario of the cascade replication for postgreSQL 9.3
beta version.
A
B.....................E
C...D
1) *A is the master,*
*B & E are pointing to the A, *
*C & D are pointing to the B.*
*Tested Scenarios are as follows: *
* *
* *
a) When (A) failed, we can able to promote B or E as the master and as
usual C & D would continue to talk with the B, if we have promoted B as the
master. If we have promoted E as the master in that case i have changed
recovery.conf of C & D and replace the port and IP pointing to the E. After
restarting of C & D, it has started to talk with the E.
b) When (B) failed, I have changed recovery.conf of C & D and replace
the port and IP pointing to the E. After restarting of C & D, it has
started to talk with the E. At last A would be the master, E is pointing to
A and C & D pointing to E.
Now, in a) scenario when we promote B as the master on failure of A, that
time C & D would continue to talk with the B. However, when i am changing
recovery.conf of E by replacing the port and IP of B. it is throwing
following errors.
cp: cannot stat `/usr/local/arch/00000002.history': No such file or
directory
cp: cannot stat `/usr/local/arch/00000003.history': No such file or
directory
LOG: entering standby mode
cp: cannot stat `/usr/local/arch/00000002.history': No such file or
directory
cp: cannot stat `/usr/local/arch/000000020000000000000027': No such file or
directory
cp: cannot stat `/usr/local/arch/000000010000000000000027': No such file or
directory
cp: cannot stat `/usr/local/arch/00000002.history': No such file or
directory
*FATAL: requested timeline 2 is not a child of this server's history *
* *
*DETAIL: Latest checkpoint is at 0/272DE57C on timeline 1, but in the
history of the requested timeline, the server forked off from that timeline
at 0/272DC548 *
* *
*LOG: startup process (PID 6155) exited with exit code 1 *
* *
LOG: aborting startup due to startup process failure
======================
TEST 2: Remastering success
Structure would be
* A* *(Master)*
*(Slave1)
B........................................E (Slave2)*
(Slave3) C.....D (Slave4)
(1) stopped the *node (A)*
(2) Following are the snaps of *slave1* & *slave2* after
stopping*node (A)
*
*slave 1*
postgres=# select pg_last_xact_replay_timestamp();
pg_last_xact_replay_timestamp
----------------------------------
2013-06-26 12:13:54.056954+05:30 --------------->
timing
(1 row)
postgres=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
0/3E000084 ---------------->
received wal
(1 row)
*slave 2
*
postgres=# select pg_last_xact_replay_timestamp();
pg_last_xact_replay_timestamp
----------------------------------
2013-06-26 12:13:54.056954+05:30 ---------------> timing
(1 row)
postgres=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
------------------------------- ----------------> received
wal
0/3E000084
(1 row)
(3) Following are the logs on *slave1 while stopped node (A)*
FATAL: could not connect to the primary server: could not connect to
server: Connection refused
Is the server running on host "127.0.0.1" and accepting
TCP/IP connections on port 5432?
(4) Following are the logs on *slave2 while stopped node (A) *
FATAL: could not connect to the primary server: could not connect to
server: Connection refused
Is the server running on host "127.0.0.1" and accepting
TCP/IP connections on port 5432?
(5) Below *logs of slave1, when promoted slave1 as the master. *
LOG: received promote request
LOG: redo done at 0/3E000024
LOG: selected new timeline ID: 2
LOG: archive recovery complete
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
(6) Below logs when changed the recovery.conf of *slave2 and now it is
pointing to the slave1 after restart*.
LOG: database system was shut down in recovery at 2013-06-26 12:28:49 IST
LOG: entering standby mode
LOG: consistent recovery state reached at 0/3E000084
LOG: invalid record length at 0/3E000084
LOG: database system is ready to accept read only connections
LOG: fetching timeline history file for timeline 2 from primary server
LOG: started streaming WAL from primary at 0/3E000000 on timeline 1
LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1 at 0/3E000084
LOG: new target timeline is 2
LOG: restarted WAL streaming at 0/3E000000 on timeline 2
LOG: redo starts at 0/3E000084
Now, at this time it has successfully connected to the master and started
working again.
No comments:
Post a Comment