db1195 needs a reboot
When: TBD
Impact: Read only for a few seconds on the services below:
Services running on m2:
- otrs
- debmonitor
- xhgui
- recommendationapi
- iegreview
- sockpuppet
- mwaddlink
Switchover steps:
OLD MASTER: db1195
NEW MASTER: db1228
Check configuration differences between new and old master
- $ pt-config-diff h=db1195.eqiad.wmnet,F=/root/.my.cnf h=db1228.eqiad.wmnet,F=/root/.my.cnf
- Silence alerts on all hosts
- Topology changes: move everything under db1228
db-switchover --timeout=15 --only-slave-move db1195.eqiad.wmnet db1228.eqiad.wmnet
- Disable puppet @db1195 and puppet @db1228 sudo cumin 'db1195* or db1228*' 'disable-puppet "primary switchover T368494"'
- Merge gerrit: https://gerrit.wikimedia.org/r/1050814
- Run puppet on dbproxy1023 and dbproxy1025 and check the config
run-puppet-agent && cat /etc/haproxy/conf.d/db-master.cfg
- Start the failover
!log Failover m2 from db1195 to db1228 - T368494
root@cumin1001:~/wmfmariadbpy/wmfmariadbpy# db-switchover --skip-slave-move db1195 db1228
- Reload haproxies
dbproxy1023: systemctl reload haproxy && echo "show stat" | socat /run/haproxy/haproxy.sock stdio dbproxy1025: systemctl reload haproxy && echo "show stat" | socat /run/haproxy/haproxy.sock stdio
- kill connections on the old master (db1195)
pt-kill --print --kill --victims all --match-all F=/dev/null,S=/run/mysqld/mysqld.sock
- Restart puppet on old and new masters (for heartbeat): db1195 and db1228
sudo cumin 'db1195* or db1228*' 'run-puppet-agent -e "primary switchover T368494"'
- Check services affected (otrs, debmonitor etc)
- Clean orchestrator heartbeat to remove the old masters' one:
- sudo db-mysql db1228 heartbeat -e "delete from heartbeat where file like 'db1195%';"
- Update/resolve phabricator ticket about failover