This will allow reimaging of es1020 to buster.
Following documentation from MariaDB#External_store_section_failover_checklist
We'll move writes to es5 while we do the switchover.
Steps:
- Check out wmfmariadbpy on cumin1001: cd ~; git clone https://gerrit.wikimedia.org/r/operations/software/wmfmariadbpy
- Check out operations/software on cumin1001: cd ~; git clone https://gerrit.wikimedia.org/r/operations/software.git
- Check current topology: sudo PYTHONPATH=~/wmfmariadbpy ~/wmfmariadbpy/wmfmariadbpy/replication_tree.py es1020
- Compare old and new master: pt-config-diff h=es1020.eqiad.wmnet,F=/root/.my.cnf h=es1021.eqiad.wmnet,F=/root/.my.cnf
- Downtime alerts for all es4 hosts
- Set es1021 (new master) to weight 50:
dbctl instance es1021 set-weight 50 dbctl config commit -m "Set es1021 to weight 50 T257847"
- Move all slaves below es1021: sudo PYTHONPATH=~/wmfmariadbpy ~/wmfmariadbpy/wmfmariadbpy/switchover.py --timeout=15 --only-slave-move es1020.eqiad.wmnet es1021.eqiad.wmnet
- Confirm the topology change: sudo PYTHONPATH=~/wmfmariadbpy ~/wmfmariadbpy/wmfmariadbpy/replication_tree.py es1020
- Disable puppet on es1020 and es1021: cumin 'es102[0-1].eqiad.wmnet' "puppet agent --disable 'switchover to es1021'"
- Merge puppet CR to change es4 master: https://gerrit.wikimedia.org/r/c/operations/puppet/+/612551
- Start the failover: !log Starting es4 failover from es1020 to es1021 T257847
- Merge mediawiki-config CR to disable es4 writes: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/612559
- Deploy above MW change from deploy1001: cd /srv/mediawiki-staging/; git status; git fetch; git rebase; scap sync-file wmf-config/db-eqiad.php "Disable writes to es4 T257847"
- Check that es4 is indeed read-only (only heartbeat update statements in mysqlbinlog)
- Do the switchover:
sudo PYTHONPATH=~/wmfmariadbpy ~/wmfmariadbpy/wmfmariadbpy/switchover.py --skip-slave-move es1020.eqiad.wmnet es1021.eqiad.wmnet echo "=====> es1020" sudo -i mysql.py -h es1020 -e "show slave status\G" echo "=====> es1021" sudo -i mysql.py -h es1021 -e "show slave status\G"
- Confirm the topology change: sudo PYTHONPATH=~/wmfmariadbpy ~/wmfmariadbpy/wmfmariadbpy/replication_tree.py es1021
- Promote es1021 to master in etcd, leave es1020 (old master) with weight 0:
dbctl --scope eqiad section es4 set-master es1021 dbctl config commit -m "Promote es1021 to es4 master T257847"
- Re-start puppet on both nodes: cumin 'es102[0-1].eqiad.wmnet' "run-puppet-agent -e 'switchover to es1021'"
- Re-enable es4 on MW: REVERT https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/612559
- Deploy above MW change from deploy1001: cd /srv/mediawiki-staging/; git status; git fetch; git rebase; scap sync-file wmf-config/db-eqiad.php "Re-enable writes to es4 T257847"
- Change events for query killer:
sudo -i mysql.py -h es1020 < ~/software/dbtools/events_coredb_slave.sql sudo -i mysql.py -h es1021 < ~/software/dbtools/events_coredb_master.sql
- Update DNS: https://gerrit.wikimedia.org/r/c/operations/dns/+/612560
- Clear candidate-master status from es1021 in dbctl
- Set candidate-master status for es1020 in dbctl
- Resolve this task
Date & time: 2020-07-21 (Tuesday) at 07:00 AM UTC