Due to the current master having some possible issues (T375593) we need to switch it over.
Databases on m3: phabricator
When: Thursday 5th at 06:00 AM UTC
Impact: Writes will be disabled for around 1 minute.
Failover process
OLD MASTER: db1159
NEW MASTER: db1213
- Check configuration differences between new and old master
$ pt-config-diff h=db1159.eqiad.wmnet,F=/root/.my.cnf h=db1213.eqiad.wmnet,F=/root/.my.cnf
- Silence alerts on all hosts: sudo cookbook sre.hosts.downtime --hours 1 -r "m3 master switchover T381365" 'A:db-section-m3'
- Topology changes: move everything under db1213
db-switchover --timeout=15 --only-slave-move db1159.eqiad.wmnet db1213.eqiad.wmnet
- Disable puppet db1213 and db1159 sudo cumin 'db1159* or db1213*' 'disable-puppet "primary switchover T381365"'
- Merge gerrit: TO-DO
- Run puppet on dbproxy1026 and dbproxy1020 and check the config
run-puppet-agent && cat /etc/haproxy/conf.d/db-master.cfg
- Start the failover: !log Failover m3 from db1159 to db1213 - T381365
- Set phabricator in RO:
ssh phab1004 sudo /srv/phab/phabricator/bin/config set cluster.read-only true # restart database server sudo /srv/phab/phabricator/bin/config set cluster.read-only false
- DB switchover
root@cumin1001:~/wmfmariadbpy/wmfmariadbpy# db-switchover --skip-slave-move db1159 db1213
- Reload haproxies
dbproxy1026: systemctl reload haproxy && echo "show stat" | socat /run/haproxy/haproxy.sock stdio dbproxy1020: systemctl reload haproxy && echo "show stat" | socat /run/haproxy/haproxy.sock stdio
- kill connections on the old master (db1159)
pt-kill --print --kill --victims all --match-all F=/dev/null,S=/run/mysqld/mysqld.sock
- Restart puppet on old and new masters (for heartbeat): db1213 and db1159 sudo cumin 'db1213* or db1159*' 'run-puppet-agent -e "primary switchover T381365"'
- Check services affected: phabricator
- Clean orchestrator heartbeat to remove the old masters' one, otherwise Orchestrator will show lag: delete from heartbeat where server_id=171966512;