Page MenuHomePhabricator

Replace db2044 (m2 codfw master) with db2067
Closed, ResolvedPublic

Description

db2044 is currently m2 codfw master, this host has a broken disk and has had many disks failures in the past. It will be decommissioned.
Let's replace it with db2067.

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2019-08-19T05:46:07Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db2067, will be moved to m1 T230705', diff saved to https://phabricator.wikimedia.org/P8930 and previous config saved to /var/cache/conftool/dbconfig/20190819-054606-marostegui.json

Change 530789 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db2067 from config

https://gerrit.wikimedia.org/r/530789

Change 530789 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db2067 from config

https://gerrit.wikimedia.org/r/530789

Mentioned in SAL (#wikimedia-operations) [2019-08-19T05:50:17Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Remove db2067 from config T230705 (duration: 00m 50s)

Mentioned in SAL (#wikimedia-operations) [2019-08-19T05:51:09Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Remove db2067 from config T230705 (duration: 00m 47s)

Change 530790 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Move db2067 to m2, decomm db2063

https://gerrit.wikimedia.org/r/530790

Change 530790 merged by Marostegui:
[operations/puppet@production] mariadb: Move db2067 to m2, decomm db2063

https://gerrit.wikimedia.org/r/530790

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2067.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201908190711_marostegui_144027.log.

Marostegui triaged this task as Medium priority.Aug 19 2019, 7:11 AM
Marostegui moved this task from Triage to In progress on the DBA board.

Completed auto-reimage of hosts:

['db2067.codfw.wmnet']

Of which those FAILED:

['db2067.codfw.wmnet']

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2067.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201908190807_marostegui_154648.log.

Completed auto-reimage of hosts:

['db2067.codfw.wmnet']

Of which those FAILED:

['db2067.codfw.wmnet']

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2067.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201908190907_marostegui_167018.log.

Completed auto-reimage of hosts:

['db2067.codfw.wmnet']

and were ALL successful.

db2067 is now replicating from db2044.
I am going to give it a few hours before promoting it to m2 codfw master.

Change 530839 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Do not reimage db2067

https://gerrit.wikimedia.org/r/530839

Change 530839 merged by Marostegui:
[operations/puppet@production] install_server: Do not reimage db2067

https://gerrit.wikimedia.org/r/530839

Change 530880 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2067: Enable notifications

https://gerrit.wikimedia.org/r/530880

Change 530880 merged by Marostegui:
[operations/puppet@production] db2067: Enable notifications

https://gerrit.wikimedia.org/r/530880

Change 531025 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] dbproxy2002: Promote db2067 to m2 codfw master

https://gerrit.wikimedia.org/r/531025

Mentioned in SAL (#wikimedia-operations) [2019-08-20T05:18:33Z] <marostegui> Switchover m2 codfw master, db2044 -> db2067 T230705

Change 531025 merged by Marostegui:
[operations/puppet@production] dbproxy2002: Promote db2067 to m2 codfw master

https://gerrit.wikimedia.org/r/531025

Mentioned in SAL (#wikimedia-operations) [2019-08-20T05:24:50Z] <marostegui> Reload haproxy on dbproxy2002 T230705

db2067 is the new m2 codfw master:

./replication_tree.py db2067.codfw.wmnet
db2067, version: 10.1.39, up: 18h, RO: ON, binlog: MIXED, lag: 0, processes: 16, latency: 0.0429
+ db2044, version: 10.1.39, up: 27d, RO: ON, binlog: MIXED, lag: 0, processes: 9, latency: 0.0411
+ db2078:3322, version: 10.1.39, up: 89d, RO: ON, binlog: MIXED, lag: 0, processes: 13, latency: 0.0428