Page MenuHomePhabricator

Beta cluster master switchover to deployment-db07
Closed, ResolvedPublic

Description

We now have Buster/10.4 replicas: T276968: deployment-db05 needs replacing following disk corruption

After the new replicas have been working for a bit without problems, we should switch db07 to be the database master.

Thanks for the help, everyone. I would still like to get off of db06 if possible at the end of this process since we have to finish the buster upgrade at some point anyhow. If we can get both db07 and db08 to reach the same point in the binlog from db06, can we simply:

  • STOP SLAVE on db07 and db08
  • FLUSH TABLES WITH READ LOCK on db07. double check SHOW SLAVE STATUS again to verify same position as db08
  • CHANGE MASTER on db08 to replicate from db07
  • START SLAVE on db08
  • configure mediawiki-config to use db07 as master (read load 0) and db08 as replica

Does that sound right?

Forgot the UNLOCK TABLES on db07 :)

Event Timeline

This is scheduled for Mar 11, 13:30 UTC. Beta will be read only during the switchover.

Change 670803 had a related patch set uploaded (by Majavah; owner: Majavah):
[operations/mediawiki-config@master] betacluster: promote db07 as database master

https://gerrit.wikimedia.org/r/670803

Change 670804 had a related patch set uploaded (by Majavah; owner: Majavah):
[operations/mediawiki-config@master] betacluster: read only for db master switchover

https://gerrit.wikimedia.org/r/670804

Mentioned in SAL (#wikimedia-releng) [2021-03-11T13:32:50Z] <Majavah> set deployment-db06 as read only T277070

Mentioned in SAL (#wikimedia-releng) [2021-03-11T13:34:41Z] <Majavah> stop and reset slave on deployment-db07 T277070

Mentioned in SAL (#wikimedia-releng) [2021-03-11T13:37:47Z] <Majavah> make deployment-db06 and deployment-db08 be replicas of deployment-db07 T277070

Change 670803 merged by jenkins-bot:
[operations/mediawiki-config@master] betacluster: promote db07 as db master, decom db06

https://gerrit.wikimedia.org/r/670803

Mentioned in SAL (#wikimedia-releng) [2021-03-11T13:48:01Z] <Majavah> stop mariadb to ensure reads have stopped on deployment-db06 T277070

Mentioned in SAL (#wikimedia-releng) [2021-03-11T13:48:37Z] <Majavah> set deployment-db07 as r/w T277070

Mentioned in SAL (#wikimedia-releng) [2021-03-11T13:51:06Z] <Majavah> shut down deployment-db06, now unused T277070

taavi claimed this task.

This was done with about 15 minutes of read-only time.

Change 670804 abandoned by Majavah:
[operations/mediawiki-config@master] betacluster: read only for db master switchover

Reason:
not needed

https://gerrit.wikimedia.org/r/670804