WARNING: Slot 0: OK: 1I:1:1, 1I:1:10, 1I:1:11, 1I:1:12, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9 - Controller: OK - Cache: Permanently Disabled - Battery/Capacitor: Failed (Replace Batteries)
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T208323 Predictive failures on disk S.M.A.R.T. status | |||
Resolved | • Marostegui | T228258 Decommission db2043-db2070 | |||
Declined | None | T227862 (OoW) db2045 failed battery |
Event Timeline
Based on https://noc.wikimedia.org/conf/highlight.php?file=db-codfw.php&1 and T184888 I will switchover codfw master to db2069.
Mentioned in SAL (#wikimedia-operations) [2019-07-12T10:24:22Z] <jynus> switchover x1 codfw master from db2045 to db2069 T227862
Everything went well except:
Updating tendril... [WARNING] Old master not found on tendril server list Updating zarcillo... [WARNING] Old master not found on zarcillo master list
Change 522403 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Promote db2069 to be the new x1 codfw master
Change 522403 merged by Jcrespo:
[operations/puppet@production] mariadb: Promote db2069 to be the new x1 codfw master
Change 522409 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Promote db2069 to be the new x1 codfw master
Change 522409 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Promote db2069 to be the new x1 codfw master
root@db1115.eqiad.wmnet[zarcillo]> update masters set instance='db2069' where section='x1' and dc='codfw'; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 root@db1115.eqiad.wmnet[zarcillo]> update masters set instance='db1120' where section='x1' and dc='eqiad'; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0
Tendril should fail, as it doesn't have the concept of "primary master" vs. Datacenter master, so it won't work for --replicating-master runs
Mentioned in SAL (#wikimedia-operations) [2019-07-15T20:02:10Z] <jynus> reducing consistency of db2045 to avoid lag at T227862
Change 523880 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Clarify db2045 status
Change 523880 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Clarify db2045 status
Mentioned in SAL (#wikimedia-operations) [2019-07-17T09:21:15Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Depool and clarify db2045 status T227862 (duration: 00m 55s)
No point on spending time with this old host, I will start its decommissioning process.
Going to close this ticket as I have created the decommission one: T228281: decommission db2045.codfw.wmnet