db1073 is the current primary master for m5 which holds the following databases:
```
root@db1073.eqiad.wmnet[(none)]> show databases;
+------------------------+
| Database |
+------------------------+
| designate |
| designate_pool_manager |
| glance |
| keystone |
| labsdbaccounts |
| labspuppet |
| labswiki |
| labtestwiki |
| neutron |
| nova |
| nova_api |
| nova_api_eqiad1 |
| nova_eqiad1 |
| performance_schema |
| striker |
| test_labsdbaccounts |
| testreduce_0715 |
| testreduce_vd |
+------------------------+
22 rows in set (0.00 sec)
```
Apart from the cloud ones, it also holds wikitech (labswiki database).
db1073 is very old, out of warranty and has 2 disks on predictive failure. This host is also scheduled for decommission {T217396}
I would like to fail it over to db1133, a newer and more powerful host.
The procedure would be to set db1073 into read-only, promote db1133 and set db1133 to be writable - db1073 will remain on read-only. That MySQL operations should only take a few seconds
However, we need to make sure the services start using db1133.
For the cloud services that use it
m5 currently doesn't use a proxy:
```
# host m5-master
m5-master.eqiad.wmnet is an alias for db1073.eqiad.wmnet.
db1073.eqiad.wmnet has address 10.64.16.79
```
Even though the proxy isn't in use, we have to also change it to reflect that db1133 is the master.
So we'd need to do a DNS switch for it.
Currently its TTL is 5M, so I think we should decrease it to 1M, to avoid that 5 minutes downtime until they full start using db1133.
**Update 8th August: TTL changed:** https://gerrit.wikimedia.org/r/529065
For wikitech, we just need to use the new `dbctl` tool to promote it to master (after pooling db1133 with weight 0, which can be done a day in advance). So the command would be
```
dbctl --scope eqiad section wikitech set-master db1133
dbctl config commit
```
When:
**Tuesday 3rd Sept at 13:00 UTC**
I think total read-only would be around 5 minutes, reads won't be affected as db1073 will be up at all times.
I would like to coordinate with #cloud-services-team to find a proper date and time to do this operation and communicate it on wikitech-l and on other channels you might consider necessary.
Also CCing @CDanis and @Volans as this would be the first time we'd use `dbctl` to set a master and it would be nice to have one of them online just in case :)
**Procedure**:
Old master: db1073
New master: db1133
Pre-failover steps a few minutes before 13:00 UTC
[x] @Marostegui to silence alerts on m5 hosts
[x] @Marostegui to change replication and get everything to replicate from db1133
[x] @marostegui to pool db1133 with weight 0 on wikitech section via `dbctl instance db1133 edit` and then `dbctl config commit -m "Pool db1133 with weight 0 T229657"` so it can be later set as master.
[x] @Marostegui to disable puppet on db1073 and db1133 and merge: https://gerrit.wikimedia.org/r/#/c/operations/dns/+/529333/ https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/529331/
Failover at 13:00 UTC
[x] @Marostegui to log on -operations that the failover is starting
[x] @Marostegui to set read-only
```
dbctl --scope eqiad section wikitech ro "Maintenance on wikitech T229657 " && dbctl config commit -m "Set wikitech as read-only for maintenance T229657"`
```
[x] @Marostegui to perform the failover on a mysql level (at this point db1073 will become read-only)
[x] @Marostegui to change the master on MW: `dbctl --scope eqiad section wikitech set-master db1133 ; dbctl config commit -m "Promote db1133 to wikitech master T229657"`
[x] @Marostegui to kill connections on db1073
[x] @Marostegui to set wikitech back to RW: `dbctl --scope eqiad section wikitech rw && dbctl config commit -m "Set wikitech back to RW after maintenance T229657"`
[x] @Marostegui to `authdns-update` the DNS change
[x] @Marostegui to reload dbproxy1005 proxy
[] @JHedden to verify everything starts connecting to db1133 as the m5-master record gets changed from db1073 to db1133 and restart services if needed.
Failover clean up steps
[] @Marostegui to merge https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/534144/
[x] @Marostegui to re-enable and run puppet on db1073 and db1133
[x] @Marostegui to change query killers for db1073 and db1133.
[x] @Marostegui to depool db1073 from wikitech: `dbctl instance db1073 depool ; dbctl config commit -m "Depool db1073 from wikitech T229657"`