As part of upgrading s2 to debian buster/mariadb 10.4, we need to switch the master to be db2104
When: Tue 10th Aug at 05:00 AM UTC.
Checklist:
- Create a task to communicate the chosen date and send an announcement to the community: T287449
NEW master: db2104
OLD master: db2107
- Check configuration differences between new and old master:
sudo pt-config-diff h=db2104.codfw.wmnet,F=/root/.my.cnf h=db2107.codfw.wmnet,F=/root/.my.cnf
Failover prep:
- Silence alerts on all hosts:
sudo cookbook sre.hosts.downtime --hours 1 -r "Master switchover s2 T287454" 'A:db-section-s2'
- Set NEW master with weight 0
sudo dbctl instance db2104 set-weight 0 sudo dbctl config commit -m "Set db2104 with weight 0 T287454"
- Topology changes, move all replicas under NEW master
sudo db-switchover --timeout=15 --only-slave-move db2107.codfw.wmnet db2104.codfw.wmnet
- Disable puppet on both nodes
sudo cumin 'db2104* or db2107*' 'disable-puppet "master switchover T287454"'
- Merge gerrit puppet change to promote NEW master: https://gerrit.wikimedia.org/r/711114
Failover:
- Log the failover:
!log Starting s2 codfw failover from db2107 to db2104 - T287454
- Set section read-only:
sudo dbctl --scope codfw section s2 ro "Maintenance until 05:15 UTC - T287454" sudo dbctl config commit -m "Set s2 codfw as read-only for maintenance - T287454"
- Check s2 is indeed read-only
- Switch masters:
sudo DEBUG=1 db-switchover --skip-slave-move db2107 db2104 echo "===== db2107 (OLD)"; sudo mysql.py -h db2107 -e 'show slave status\G' echo "===== db2104 (NEW)"; sudo mysql.py -h db2104 -e 'show slave status\G'
- Promote NEW master in dbctl, and remove read-only
sudo dbctl --scope codfw section s2 set-master db2104 sudo dbctl --scope codfw section s2 rw sudo dbctl config commit -m "Promote db2104 to s2 master and set section read-write T287454"
- Restart puppet on both hosts (for heartbeat):
sudo cumin 'db2104* or db2107*' 'run-puppet-agent -e "master switchover T287454"'
Clean up tasks:
- change events for query killer:
events_coredb_master.sql on the new master db2104 events_coredb_slave.sql on the new slave db2107
- Update DNS: https://gerrit.wikimedia.org/r/c/operations/dns/+/710517
- Update candidate master dbctl notes
sudo dbctl instance db2107 set-candidate-master --section s2 true sudo dbctl instance db2104 set-candidate-master --section s2 false
- Check tendril was updated
- Check zarcillo was updated
- Depool OLD master, as it's running 10.1, replicating from a 10.4 master
sudo dbctl instance db2107 depool sudo dbctl config commit -m "Depool db2107 until it's reimaged to buster T287454"
- Update/resolve this ticket.