Databases on m1:
+--------------------+ | Database | +--------------------+ | bacula9 | | cas | | cas_staging | | dbbackups | | etherpadlite | | heartbeat | | information_schema | | librenms | | mysql | | percona | | performance_schema | | pki | | racktables | | rddmarc | | rt | | sys | +--------------------+ 16 rows in set (0.002 sec)
When: Soon
Impact: Writes will be disabled for around 1 minute.
Failover process
OLD MASTER: db1128
NEW MASTER: db1164
- Check configuration differences between new and old master
$ pt-config-diff h=db1128.eqiad.wmnet,F=/root/.my.cnf h=db1164.eqiad.wmnet,F=/root/.my.cnf
- Silence alerts on all hosts
- Add db1164 as a secondary on the proxies- Merge and deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/799915
- Topology changes: move everything under db1164
db-switchover --timeout=15 --only-slave-move db1128.eqiad.wmnet db1164.eqiad.wmnet
- Disable puppet db1128 and db1164 disable-puppet "switchover T309296"
- Merge gerrit: https://gerrit.wikimedia.org/r/c/operations/puppet/+/799901
- Run puppet on dbproxy1012 and dbproxy1014 and check the config
run-puppet-agent && cat /etc/haproxy/conf.d/db-master.cfg
- Start the failover: !log Failover m1 from db1128 to db1164 - T309296
- DB switchover
root@cumin1001:~/wmfmariadbpy/wmfmariadbpy# db-switchover --read-only-master --skip-slave-move db1128 db1164
- Reload haproxies (dbproxy1012 is the active one)
dbproxy1012: systemctl reload haproxy && echo "show stat" | socat /run/haproxy/haproxy.sock stdio dbproxy1014: systemctl reload haproxy && echo "show stat" | socat /run/haproxy/haproxy.sock stdio
- kill connections on the old master (db1128)
pt-kill --print --kill --victims all --match-all F=/dev/null,S=/run/mysqld/mysqld.sock
- Restart puppet on old and new masters (for heartbeat):db1128 and db1164 puppet agent --enable && puppet agent -tv
- Check affected services
- Clean orchestrator heartbeat to remove the old masters' one, otherwise Orchestrator will show lag: delete from heartbeat where server_id= 171966562
- If everything looks good, afterwards: Move backups to use the new host: Merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/799894
- Close this ticket and create a ticket to move db1128 to s1 T309303