Page MenuHomePhabricator
Paste P8001

(An Untitled Masterwork)
ActivePublic

Authored by Marostegui on Jan 17 2019, 7:01 AM.
./switchover.py --skip-slave-move db1075 db1078
Starting preflight checks...
* Original read only values are as expected (master: read_only=0, slave: read_only=1)
* The host to fail over is a direct replica of the master
* Replication is up and running between the 2 hosts
* The replication lag is acceptable: 0 (lower than the configured or default timeout)
* The master is not a replica of any other host
----- OUTPUT of '/bin/ps --no-hea...pid,args -C perl' -----
9225 /usr/bin/perl /usr/local/bin/pt-heartbeat-wikimedia --defaults-file=/dev/null --user=root --host=localhost -D heartbeat --shard=s3 --datacenter=eqiad --update --replace --interval=1 --set-vars=binlog_format=STATEMENT -S /run/mysqld/mysqld.sock --daemonize --pid /var/run/pt-heartbeat.pid
================
PASS: |██████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 2.98hosts/s]
FAIL: | | 0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: '/bin/ps --no-hea...pid,args -C perl'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
Stopping heartbeat pid 9225 at db1075.eqiad.wmnet:3306/(none)
----- OUTPUT of '/bin/kill 9225' -----
================
PASS: |██████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 3.20hosts/s]
FAIL: | | 0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: '/bin/kill 9225'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
Setting up original master as read-only
Slave caught up to the master after waiting 0.007711648941040039 seconds
Servers sync at master: db1075-bin.003638:797446292 slave: db1078-bin.000165:412441960
Stopping original master->slave replication
Setting up replica as read-write
All commands where successful, current status: original master read_only: 1 / original slave read_only: 0
Trying to invert replication direction
Starting heartbeat section s3 at db1078.eqiad.wmnet
----- OUTPUT of '/usr/bin/nohup /...d &> /dev/null &' -----
================
PASS: |██████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 3.28hosts/s]
FAIL: | | 0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: '/usr/bin/nohup /...d &> /dev/null &'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
----- OUTPUT of '/bin/ps --no-hea...pid,args -C perl' -----
16847 /usr/bin/perl /usr/local/bin/pt-heartbeat-wikimedia --defaults-file=/dev/null --user=root --host=localhost -D heartbeat --shard=s3 --datacenter=eqiad --update --replace --interval=1 --set-vars=binlog_format=STATEMENT -S /run/mysqld/mysqld.sock --daemonize --pid /var/run/pt-heartbeat.pid
================
PASS: |██████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 3.37hosts/s]
FAIL: | | 0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: '/bin/ps --no-hea...pid,args -C perl'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
Detected heartbeat at db1078.eqiad.wmnet running with PID 16847
Verifying everything went as expected...
SUCCESS: Master switch completed successfully

Event Timeline