Continuing with the misc sections upgrade, m5 needs to be upgraded to Buster and Mariadb 10.4
Hosts:
- db2135
- db2078
- db1133 (to be replaced by db1128 once it is removed as master from m3 T259589)
- db1128
- db1117
This will require a master failover once db1128 is provisioned.
m5 doesn't use the proxies, so we'll need to do a DNS failover, similar to what we did last time: T229657
Failover procedure:
Procedure:
Old master: db1133
New master: db1128
Decrease TTL a few days before the switchover
- @Marostegui to change m5's master alias TTL from 5M to 1M https://gerrit.wikimedia.org/r/c/operations/dns/+/622266/
Pre-failover steps a few minutes before 14:00 UTC
- @Marostegui to silence alerts on m5 hosts
- @Marostegui to change replication and get everything to replicate from db1128
- See: T260324#6429410 @Marostegui to pool db1128 with weight 0 on wikitech section via dbctl instance db1128 edit and then dbctl config commit -m "Pool db1128 with weight 0 T260324" so it can be later set as master. https://gerrit.wikimedia.org/r/623748 https://phabricator.wikimedia.org/P12431
- @Marostegui to disable puppet on db1133 and db1128 and merge: DNS change to change m5-master alias and puppet change to change site.pp and dbproxy1021 dbproxy1017 config (even if it is not used) https://gerrit.wikimedia.org/r/c/operations/puppet/+/623757 https://gerrit.wikimedia.org/r/c/operations/dns/+/623759/
Failover at 14:00 UTC
- @Marostegui to log on -operations that the failover is starting
- @Marostegui to set read-only
dbctl --scope eqiad section wikitech ro "Maintenance on wikitech T260324 " && dbctl config commit -m "Set wikitech as read-only for maintenance T260324"`
- @Marostegui to perform the failover on a mysql level (at this point db1133 will become read-only)
- @Marostegui to pool db1128 first on dbctl see T260324#6429410
- @Marostegui to change the master on MW: dbctl --scope eqiad section wikitech set-master db1128 ; dbctl config commit -m "Promote db1128 to wikitech master T260324"
- @Marostegui to kill connections on db1133
- @Marostegui to set wikitech back to RW: dbctl --scope eqiad section wikitech rw && dbctl config commit -m "Set wikitech back to RW after maintenance T260324"
- @Marostegui to authdns-update the DNS change
- @Marostegui to reload dbproxy1021 and dbproxy1017
- @Andrew or someone from cloud-services-team to verify everything starts connecting to db1128 as the m5-master record gets changed from db1133 to db1128 and restart services if needed.
Failover clean up steps
- @Marostegui to re-enable and run puppet on db1133 and db1128
- @Marostegui to depool db1133 from wikitech: dbctl instance db1133 depool ; dbctl config commit -m "Depool db1133 from wikitech T260324"
- @Marostegui to change m5's master alias TTL from 1M to 5M