db1176 is in row A and there will be a switch maintenance with hard downtime
When: Monday 13th Feb - 11AM UTC
Impact: Read only for a few seconds on the services below:
Services running on m1:
* bacula
* cas (and cas staging)
* backups
* etherpad
* librenms
* pki
* rt
Switchover steps:
OLD MASTER: db1176
NEW MASTER: db1164
Check configuration differences between new and old master
[x] `$ pt-config-diff h=db1176.eqiad.wmnet,F=/root/.my.cnf h=db1164.eqiad.wmnet,F=/root/.my.cnf `
[] Silence alerts on all hosts
[x] Topology changes: move everything under db1164
`db-switchover --timeout=1 --only-slave-move db1176.eqiad.wmnet db1164.eqiad.wmnet`
[] Disable puppet @db1164 and puppet @db1176
`sudo cumin 'db1176* or db1164*' 'disable-puppet "primary switchover T329259"'`
[] Merge gerrit: https://gerrit.wikimedia.org/r/c/operations/puppet/+/888359
[] Run puppet on dbproxy1012 and dbproxy1014 and check the config
`run-puppet-agent && cat /etc/haproxy/conf.d/db-master.cfg`
[] Start the failover
`!log Failover m1 from db1176 to db1164 - T329259`
```
root@cumin1001:~/wmfmariadbpy/wmfmariadbpy# db-switchover --skip-slave-move db1176 db1164
```
[] Reload haproxies
```
dbproxy1012: systemctl reload haproxy && echo "show stat" | socat /run/haproxy/haproxy.sock stdio
dbproxy1014: systemctl reload haproxy && echo "show stat" | socat /run/haproxy/haproxy.sock stdio
```
[] kill connections on the old master (db1176)
` pt-kill --print --kill --victims all --match-all F=/dev/null,S=/run/mysqld/mysqld.sock`
[] Restart puppet on old and new masters (for heartbeat):db1164 and db1176
` sudo cumin 'db1176* or db1164*' 'run-puppet-agent -e "primary switchover T329259"'`
[] Check services affected (librenms, racktables, etherpad...)
[] Clean orchestrator heartbeat to remove the old masters' one: `sudo db-mysql db1164 heartbeat -e "delete from heartbeat where file like 'db1176%';"`
[] Merge backup ticket: https://gerrit.wikimedia.org/r/c/operations/puppet/+/887885/
[] Create floating ticket for db1176 to be moved to m5:
[] Update/resolve phabricator ticket about failover