Page MenuHomePhabricator

barium has a failed HDD
Closed, ResolvedPublic

Description

Notification Type: PROBLEM

Service: check_raid
Host: barium
Address: 10.64.40.109
State: CRITICAL

Date/Time: Wed Mar 25 16:00:11 UTC 2015

Additional Info:

CRITICAL: MegaSAS 2 logical, 4 physical: a0/v1 (2 disk array) degraded
Love, Icinga=

Event Timeline

Jgreen raised the priority of this task from to Needs Triage.
Jgreen updated the task description. (Show Details)
Jgreen subscribed.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Jgreen triaged this task as High priority.Mar 25 2015, 4:06 PM
Jgreen set Security to None.

nclosure Device ID: N/A
Slot Number: 3
Drive's position: DiskGroup: 1, Span: 0, Arm: 1
Enclosure position: N/A
Device Id: 3
WWN: 5000c5004f2eb5e4
Sequence Number: 3
Media Error Count: 147
Other Error Count: 86
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA

Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Sector Size: 0
Firmware state: Failed
Device Firmware Level: CC24
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x4433221107000000
Connected Port Number: 3(path0)
Inquiry Data: Z1F1VWKYST3000DM001-1CH166 CC24
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Hard Disk Device
Drive: Not Certified
Drive Temperature : N/A
PI Eligibility: No
Drive is formatted for PI information: No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Drive has flagged a S.M.A.R.T alert : No

Exit Code: 0x00

New DIsk Ordered.

WO6747230 Under Review 55H13W1 WIKIMEDIA FOUNDATION, INC Failed Hard Drive slot 3 4/1/2015 11:21 AM

received the new disk. This will require downtime. Sent an email to FR-ALL for Tuesday 4/7 at 930est.

The disk that failed was an add-on of a 3TB disk and not covered under warranty. We do not have any 3TB disks on-site to swap out and will need to order more.

The disk that failed was an add-on of a 3TB disk and not covered under warranty. We do not have any 3TB disks on-site to swap out and will need to order more.

https://rt.wikimedia.org/Ticket/Display.html?id=9295

This will be replaced 4/14 at 10am EST.

Updated icinga with the downtime 4/14 1400 -1415

returned the disk Dell sent me.

Tracking numbers
FEDEX 9611918 2393026 47861526

cmjohnson can you include a round of package/kernel updates of the "dist-upgrade" when you do this for T95887?

New disk is on-line

nclosure Device ID: N/A
Slot Number: 3
Drive's position: DiskGroup: 1, Span: 0, Arm: 0
Enclosure position: N/A
Device Id: 3
WWN: 5000c500794334bb
Sequence Number: 5
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA

Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Sector Size: 0
Firmware state: Online, Spun Up
Device Firmware Level: CC25

package updates were successful..resolving this ticket

Reopening because RAID is degraded (still? again?) -- appears that one of the disks is still offline, but I'm not sure whether its the one that physically failed.