Page MenuHomePhabricator

db1052 (s1 master) disks with lots of predictive failure errors
Closed, ResolvedPublic

Description

s1 master, db1052 has lots of errors on two disks (T190035#4061483), we should probably fail it manually (one at the time)
(https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Disks_about_to_fail) sometime and coordinate with @Cmjohnson get it replaced

Slot Number: 2
Media Error Count: 35
Other Error Count: 2
Predictive Failure Count: 325
Drive has flagged a S.M.A.R.T alert : Yes



Slot Number: 8
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 419
Drive has flagged a S.M.A.R.T alert : Yes

Event Timeline

Marostegui triaged this task as Medium priority.Mar 21 2018, 3:57 PM
Marostegui created this task.
Marostegui renamed this task from db1052 (s1 master) disk with lots of predictive failure errors to db1052 (s1 master) disks with lots of predictive failure errors.Mar 21 2018, 4:04 PM

Replaced the disk at slot 2....I will wait for the rebuild to complete before swapping slot 8

Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Rebuild
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up

The disk at slot 8 has been swapped and rebuilding

Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Rebuild
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up

Marostegui assigned this task to Cmjohnson.

All good now!

root@db1052:~# megacli -LDPDInfo -aAll | egrep -i "slot|error|failure count|s.m.a.r.t"
Slot Number: 0
Media Error Count: 106
Other Error Count: 1
Predictive Failure Count: 0
Drive has flagged a S.M.A.R.T alert : No
Slot Number: 1
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Drive has flagged a S.M.A.R.T alert : No
Slot Number: 2
Media Error Count: 0
Other Error Count: 1
Predictive Failure Count: 0
Drive has flagged a S.M.A.R.T alert : No
Slot Number: 3
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Drive has flagged a S.M.A.R.T alert : No
Slot Number: 4
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Drive has flagged a S.M.A.R.T alert : No
Slot Number: 5
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Drive has flagged a S.M.A.R.T alert : No
Slot Number: 6
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Drive has flagged a S.M.A.R.T alert : No
Slot Number: 7
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Drive has flagged a S.M.A.R.T alert : No
Slot Number: 8
Media Error Count: 0
Other Error Count: 1
Predictive Failure Count: 0
Drive has flagged a S.M.A.R.T alert : No
Slot Number: 9
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Drive has flagged a S.M.A.R.T alert : No
Slot Number: 10
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Drive has flagged a S.M.A.R.T alert : No
Slot Number: 11
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Drive has flagged a S.M.A.R.T alert : No
root@db1052:~#  megacli -LDPDInfo -aAll

Adapter #0

Number of Virtual Disks: 1
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 3.271 TB
Sector Size         : 512
Mirror Data         : 3.271 TB
State               : Optimal
Strip Size          : 256 KB