Page MenuHomePhabricator

Failover s8 (wikidatawiki) db primary master db1071 to db1104 (read-only required)
Closed, ResolvedPublic

Description

db1071 is an old host which is out of warranty and needs to be decommissioned.
Also, the normalization of wikidatawiki.wb_terms will require a master with more throughput and more disk space for the next wb_term table redesign phases (T221764)

We want to failover db1071 to db1104.

When: Tuesday 30th July, 05:00AM-05:30 AM UTC
Impact: Writes to wikidatawiki will be blocked. Reads will remain unaffected.

We do not expect to use the full 30 minutes window, and rather use around 3-4 minutes if everything goes as expected

Event Timeline

Marostegui triaged this task as Medium priority.Jul 2 2019, 8:14 AM
Marostegui moved this task from Triage to In progress on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2019-07-02T09:34:30Z] <marostegui> Upgrade mysql on 2080 db2081 db2083 - T227062

Mentioned in SAL (#wikimedia-operations) [2019-07-02T09:39:11Z] <marostegui> Upgrade db2094 (codfw sanitarium) T227062

All codfw is now running 10.1.39 (which is the version the new master will run) - will keep upgrading eqiad now.

Change 520673 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1101

https://gerrit.wikimedia.org/r/520673

Change 520673 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1101

https://gerrit.wikimedia.org/r/520673

Change 520732 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db1109 for upgrade

https://gerrit.wikimedia.org/r/520732

Change 520732 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1109 for upgrade

https://gerrit.wikimedia.org/r/520732

Change 520739 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Repool db1109 after maintenance with low weight

https://gerrit.wikimedia.org/r/520739

Change 520739 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Repool db1109 after maintenance with low weight

https://gerrit.wikimedia.org/r/520739

Change 520830 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1104

https://gerrit.wikimedia.org/r/520830

Change 520830 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1104

https://gerrit.wikimedia.org/r/520830

Change 520842 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1109: Convert it to candidate master

https://gerrit.wikimedia.org/r/520842

Change 520842 merged by Marostegui:
[operations/puppet@production] db1109: Convert it to candidate master

https://gerrit.wikimedia.org/r/520842

Change 521203 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1109

https://gerrit.wikimedia.org/r/521203

Change 521203 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1109

https://gerrit.wikimedia.org/r/521203

Mentioned in SAL (#wikimedia-operations) [2019-07-08T05:45:06Z] <marostegui> Restart MySQL on db1109 to pick up STATEMENT as binlog format - T227062

I have restarted db1109 to pickup STATEMENT as a binlog format. db1109 will be the candidate master once db1104 (current candidate master) gets promoted to master.

Change 524411 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Promote db1104 to s8 master

https://gerrit.wikimedia.org/r/524411

Change 524412 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Pool db1109 into API

https://gerrit.wikimedia.org/r/524412

Change 524412 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Pool db1109 into API

https://gerrit.wikimedia.org/r/524412

Change 525513 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Set s8 (wikidata) into read-only

https://gerrit.wikimedia.org/r/525513

Change 526009 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1104

https://gerrit.wikimedia.org/r/526009

Change 526009 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1104

https://gerrit.wikimedia.org/r/526009

Change 526011 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Promote db1104 to s8 master

https://gerrit.wikimedia.org/r/526011

Change 526013 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/dns@master] wmnet: Update CNAME for s8 master

https://gerrit.wikimedia.org/r/526013

Mentioned in SAL (#wikimedia-operations) [2019-07-30T04:15:19Z] <marostegui> Start pre-steps for s8 primary master failover - T227062

Change 524411 merged by Marostegui:
[operations/puppet@production] mariadb: Promote db1104 to s8 master

https://gerrit.wikimedia.org/r/524411

Change 525513 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Set s8 (wikidata) into read-only

https://gerrit.wikimedia.org/r/525513

Change 526011 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Promote db1104 to s8 master

https://gerrit.wikimedia.org/r/526011

Mentioned in SAL (#wikimedia-operations) [2019-07-30T05:00:17Z] <marostegui> Starting s8 failover from db1071 to db1104 - T227062

Mentioned in SAL (#wikimedia-operations) [2019-07-30T05:00:51Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Set s8 on read-only T227062 (duration: 00m 26s)

Mentioned in SAL (#wikimedia-operations) [2019-07-30T05:01:42Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Switchover s8 master eqiad from db1071 to db1104 T227062 (duration: 00m 24s)

Mentioned in SAL (#wikimedia-operations) [2019-07-30T05:02:22Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Remove s8 ready only T227062 (duration: 00m 24s)

Change 526013 merged by Marostegui:
[operations/dns@master] wmnet: Update CNAME for s8 master

https://gerrit.wikimedia.org/r/526013

The failover was done successfully.
read-only start: 05:00:50
read-only stop: 05:02:21

Total read-only time: 01:31 minutes

Everything is looking good, so resolving this.

Mentioned in SAL (#wikimedia-operations) [2019-09-10T05:00:24Z] <marostegui> Starting s8 failover from db1104 to db1109 - T227062

Mentioned in SAL (#wikimedia-operations) [2019-09-10T05:02:14Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Promote db1109 to s8 master and remove read-only from s8 T227062', diff saved to https://phabricator.wikimedia.org/P9070 and previous config saved to /var/cache/conftool/dbconfig/20190910-050213-marostegui.json