Page MenuHomePhabricator

Bad disk on db1065
Closed, ResolvedPublic

Description

There is a disk with SMART alert on db1065 (m2 master)

It looks like disk #1

Enclosure Device ID: 32

Slot Number: 1

Drive's position: DiskGroup: 0, Span: 0, Arm: 1

Enclosure position: 1

Device Id: 1

WWN: 5000C5004797CDB8

Sequence Number: 12

Media Error Count: 7588

Other Error Count: 0

Predictive Failure Count: 2

Last Predictive Failure Event Seq Number: 62914

PD Type: SAS



Raw Size: 558.911 GB [0x45dd2fb0 Sectors]

Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]

Coerced Size: 558.375 GB [0x45cc0000 Sectors]

Sector Size:  0

Firmware state: Online, Spun Up

Device Firmware Level: ES64

Shield Counter: 0

Successful diagnostics completion on :  N/A

SAS Address(0): 0x5000c5004797cdb9

SAS Address(1): 0x0

Connected Port Number: 0(path0)

Inquiry Data: SEAGATE ST3600057SS     ES646SL2Y79Y

FDE Capable: Not Capable

FDE Enable: Disable

Secured: Unsecured

Locked: Unlocked

Needs EKM Attention: No

Foreign State: None

Device Speed: 6.0Gb/s

Link Speed: 6.0Gb/s

Media Type: Hard Disk Device

Drive Temperature :45C (113.00 F)

PI Eligibility:  No

Drive is formatted for PI information:  No

PI: No PI

Port-0 :

Port status: Active

Port's Linkspeed: 6.0Gb/s

Port-1 :

Port status: Active

Port's Linkspeed: Unknown

Drive has flagged a S.M.A.R.T alert : Yes

@Cmjohnson can we get it replaced? Please ping us before replacing it so we can set it to OFFLINE manually, just to be sure.

Related Objects

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptJun 9 2018, 8:04 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Marostegui triaged this task as Normal priority.Jun 9 2018, 8:04 AM
Marostegui moved this task from Triage to In progress on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2018-06-11T15:31:14Z] <marostegui> Set offline disk 32:1 on db1065 - T196806

Mentioned in SAL (#wikimedia-operations) [2018-06-11T15:31:52Z] <marostegui> Set offline disk 32:3 on db1063 - T196806

Disk replaced by @Cmjohnson and now rebuilding:

root@db1065:~# megacli -PDRbld -ShowProg -PhysDrv [32:1] -aALL

Rebuild Progress on Device at Enclosure 32, Slot 1 Completed 1% in 1 Minutes.
Cmjohnson moved this task from Backlog to Blocked on the ops-eqiad board.Jun 11 2018, 3:41 PM

The disk finished its rebuilt, but unfortunately has lots of errors and SMART alert too, so we need a new one :(

Predictive Failure Count: 1
Last Predictive Failure Event Seq Number: 64257
PD Type: SAS
Drive has flagged a S.M.A.R.T alert : Yes

Mentioned in SAL (#wikimedia-operations) [2018-06-11T16:42:03Z] <marostegui> Set disk 32:1 offline on db1065 to get a new one - T196806

Marostegui closed this task as Resolved.Jun 12 2018, 5:27 PM

The new disk worked fine, thanks!!

root@db1065:~# megacli -LDPDInfo -aAll

Adapter #0

Number of Virtual Disks: 1
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 3.271 TB
Sector Size         : 512
Mirror Data         : 3.271 TB
State               : Optimal
Drive has flagged a S.M.A.R.T alert : No
Drive has flagged a S.M.A.R.T alert : No
Drive has flagged a S.M.A.R.T alert : No
Drive has flagged a S.M.A.R.T alert : No
Drive has flagged a S.M.A.R.T alert : No
Drive has flagged a S.M.A.R.T alert : No
Drive has flagged a S.M.A.R.T alert : No
Drive has flagged a S.M.A.R.T alert : No
Drive has flagged a S.M.A.R.T alert : No
Drive has flagged a S.M.A.R.T alert : No
Drive has flagged a S.M.A.R.T alert : No
Drive has flagged a S.M.A.R.T alert : No
Vvjjkkii renamed this task from Bad disk on db1065 to hcbaaaaaaa.Jul 1 2018, 1:04 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii raised the priority of this task from Normal to High.
Vvjjkkii removed Cmjohnson as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
Marostegui renamed this task from hcbaaaaaaa to Bad disk on db1065.Jul 2 2018, 5:10 AM
Marostegui closed this task as Resolved.
Marostegui lowered the priority of this task from High to Normal.
Marostegui assigned this task to Cmjohnson.
Marostegui updated the task description. (Show Details)