db1159 has been cloned from db1117:3321.
db1080 needs to be decommissioned.
Let's give db1159 (runs 10.4.18) a full week to make sure it is ok and then scheduled a day to promote it to master.
Databases running on m1 master:
bacula9 cas cas_staging dbbackups etherpadlite librenms pki racktables rddmarc rt
Pre steps:
- Upgrade all m1 hosts to 10.4.18
- db2132
- db2078
- db1117
Switchover steps:
OLD MASTER: db1080
NEW MASTER: db1159
Check configuration differences between new and old master
- $ pt-config-diff h=db1159.eqiad.wmnet,F=/root/.my.cnf h=db1080.eqiad.wmnet,F=/root/.my.cnf
- Silence alerts on all hosts
- Topology changes: move everything under db1159
db-switchover --timeout=1 --only-slave-move db1080.eqiad.wmnet db1159.eqiad.wmnet
- Disable puppet @db1080 and puppet @db1159 puppet agent --disable "switchover to db1159"
- Merge gerrit: https://gerrit.wikimedia.org/r/c/operations/puppet/+/678801
- Run puppet on dbproxy1012 and dbproxy1014 and check the config
puppet agent -tv && cat /etc/haproxy/conf.d/db-master.cfg
- Start the failover
!log Failover m1 from db1080 to db1159 - T276448
root@cumin1001:~/wmfmariadbpy/wmfmariadbpy# db-switchover --skip-slave-move db1080 db1159
- Reload haproxies
dbproxy1012: systemctl reload haproxy && echo "show stat" | socat /run/haproxy/haproxy.sock stdio dbproxy1014: systemctl reload haproxy && echo "show stat" | socat /run/haproxy/haproxy.sock stdio
- kill connections on the old master (db1080)
pt-kill --print --kill --victims all --match-all F=/dev/null,S=/run/mysqld/mysql.sock
- Restart puppet on old and new masters (for heartbeat):db1080 and db1159
puppet agent --enable && puppet agent -tv
- Check services affected (librenms, racktables, etherpad...)
- change events for query killer: events_coredb_master.sql on the new master db1159 events_coredb_slave.sql on the new slave db1080
- Cleaned orchestrator heartbeat to remove the old masters' one.
- Create decommissioning ticket for db1080: T280121
- Update/resolve phabricator ticket about failover https://phabricator.wikimedia.org/T276448