Page MenuHomePhabricator

stat1002 broken disk causing degraded RAID array
Closed, ResolvedPublic

Description

Hi!

Just seen the following icinga message:

<icinga-wm> PROBLEM - RAID on stat1002 is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded)

This host uses a LVM VG on a single physical device, but from what I can see it is using RAID hw:

elukey@stat1002:~$ sudo megacli -AdpAllInfo -aALL

Device Present
================
Virtual Drives    : 1
  Degraded        : 1
  Offline         : 0
Physical Devices  : 14
  Disks           : 12
  Critical Disks  : 0
  Failed Disks    : 1
Enclosure Device ID: 32
Slot Number: 11
Drive's position: DiskGroup: 0, Span: 0, Arm: 11
Enclosure position: N/A
Device Id: 11
WWN: 50014ee25d78a974
Sequence Number: 3
Media Error Count: 1
Other Error Count: 1
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA

Raw Size: 931.512 GB [0x74706db0 Sectors]
Non Coerced Size: 931.012 GB [0x74606db0 Sectors]
Coerced Size: 931.0 GB [0x74600000 Sectors]
Sector Size:  0
Firmware state: Failed
elukey@stat1002:~$ sudo megacli -LDInfo -Lall -aALL

Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-6, Secondary-0, RAID Level Qualifier-3
Size                : 9.091 TB
Sector Size         : 512
Parity Size         : 1.818 TB
State               : Partially Degraded
Strip Size          : 256 KB
Number Of Drives    : 12
Span Depth          : 1
Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Bad Blocks Exist: No
Is VD Cached: Yes
Cache Cade Type : Read Only

Credits to @Volans for the help. Can we replace the disk asap?

Thanks!

Luca

Event Timeline

Disk has been replaced and back online