db1159 needs to be reimaged to Bullseye.
Let's promote db1128 to master
When: Thursday 27th at 10AM UTC
Impact: Read only for a few seconds on the services below:
Services running on m1:
- bacula
- cas (and cas staging)
- backups
- etherpad
- librenms
- pki
- rt
Switchover steps:
OLD MASTER: db1159
NEW MASTER: db1128
Check configuration differences between new and old master
- $ pt-config-diff h=db1159.eqiad.wmnet,F=/root/.my.cnf h=db1128.eqiad.wmnet,F=/root/.my.cnf
- Silence alerts on all hosts
- Topology changes: move everything under db1128
db-switchover --timeout=1 --only-slave-move db1159.eqiad.wmnet db1128.eqiad.wmnet
- Disable puppet @db1159 and puppet @db1128 puppet agent --disable "switchover to db1128"
- Merge gerrit: https://gerrit.wikimedia.org/r/c/operations/puppet/+/757389
- Run puppet on dbproxy1012 and dbproxy1014 and check the config
puppet agent -tv && cat /etc/haproxy/conf.d/db-master.cfg
- Start the failover
!log Failover m1 from db1159 to db1128 - T299624
root@cumin1001:~/wmfmariadbpy/wmfmariadbpy# db-switchover --skip-slave-move db1159 db1128
- Reload haproxies
dbproxy1012: systemctl reload haproxy && echo "show stat" | socat /run/haproxy/haproxy.sock stdio dbproxy1014: systemctl reload haproxy && echo "show stat" | socat /run/haproxy/haproxy.sock stdio
- kill connections on the old master (db1159)
pt-kill --print --kill --victims all --match-all F=/dev/null,S=/run/mysqld/mysqld.sock
- Restart puppet on old and new masters (for heartbeat):db1128 and db1159
puppet agent --enable && run-puppet-agent
- Check services affected (librenms, racktables, etherpad...)
- Clean orchestrator heartbeat to remove the old masters' one.
- Merge: https://gerrit.wikimedia.org/r/755960
- Create floating ticket for db1159 to be moved to m2: T300243
- Update/resolve phabricator ticket about failover