db2044 is currently m2 codfw master, this host has a broken disk and has had many disks failures in the past. It will be decommissioned.
Let's replace it with db2067.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T208323 Predictive failures on disk S.M.A.R.T. status | |||
Resolved | Marostegui | T230705 Replace db2044 (m2 codfw master) with db2067 |
Event Timeline
Mentioned in SAL (#wikimedia-operations) [2019-08-19T05:46:07Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db2067, will be moved to m1 T230705', diff saved to https://phabricator.wikimedia.org/P8930 and previous config saved to /var/cache/conftool/dbconfig/20190819-054606-marostegui.json
Change 530789 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db2067 from config
Change 530789 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db2067 from config
Mentioned in SAL (#wikimedia-operations) [2019-08-19T05:50:17Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Remove db2067 from config T230705 (duration: 00m 50s)
Mentioned in SAL (#wikimedia-operations) [2019-08-19T05:51:09Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Remove db2067 from config T230705 (duration: 00m 47s)
Change 530790 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Move db2067 to m2, decomm db2063
Change 530790 merged by Marostegui:
[operations/puppet@production] mariadb: Move db2067 to m2, decomm db2063
Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:
['db2067.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201908190711_marostegui_144027.log.
Completed auto-reimage of hosts:
['db2067.codfw.wmnet']
Of which those FAILED:
['db2067.codfw.wmnet']
Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:
['db2067.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201908190807_marostegui_154648.log.
Completed auto-reimage of hosts:
['db2067.codfw.wmnet']
Of which those FAILED:
['db2067.codfw.wmnet']
Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:
['db2067.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201908190907_marostegui_167018.log.
db2067 is now replicating from db2044.
I am going to give it a few hours before promoting it to m2 codfw master.
Change 530839 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Do not reimage db2067
Change 530839 merged by Marostegui:
[operations/puppet@production] install_server: Do not reimage db2067
Change 530880 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2067: Enable notifications
Change 530880 merged by Marostegui:
[operations/puppet@production] db2067: Enable notifications
Change 531025 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] dbproxy2002: Promote db2067 to m2 codfw master
Mentioned in SAL (#wikimedia-operations) [2019-08-20T05:18:33Z] <marostegui> Switchover m2 codfw master, db2044 -> db2067 T230705
Change 531025 merged by Marostegui:
[operations/puppet@production] dbproxy2002: Promote db2067 to m2 codfw master
Mentioned in SAL (#wikimedia-operations) [2019-08-20T05:24:50Z] <marostegui> Reload haproxy on dbproxy2002 T230705
db2067 is the new m2 codfw master:
./replication_tree.py db2067.codfw.wmnet db2067, version: 10.1.39, up: 18h, RO: ON, binlog: MIXED, lag: 0, processes: 16, latency: 0.0429 + db2044, version: 10.1.39, up: 27d, RO: ON, binlog: MIXED, lag: 0, processes: 9, latency: 0.0411 + db2078:3322, version: 10.1.39, up: 89d, RO: ON, binlog: MIXED, lag: 0, processes: 13, latency: 0.0428