Page MenuHomePhabricator

Degraded RAID on ms-be2024
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (md) was detected on host ms-be2024. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid1 sda1[0](F) sdb1[1]
      58559488 blocks super 1.2 [2/1] [_U]
      
md1 : active (auto-read-only) raid1 sdb2[1] sda2[0]
      976320 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>

Event Timeline

fgiunchedi added a subscriber: Papaul.

I don't seem to be able to login on ms-be2024 at all, from console it looks like both SSDs are considered offline:

[2506558.978048] sd 0:1:0:0: rejecting I/O to offline device
[2506559.007424] sd 0:1:0:1: rejecting I/O to offline device
[2506559.041158] sd 0:1:0:1: rejecting I/O to offline device
[2506559.080451] sd 0:1:0:0: rejecting I/O to offline device
[2506559.111785] sd 0:1:0:1: rejecting I/O to offline device
[2506559.142638] sd 0:1:0:1: rejecting I/O to offline device

@Papaul feel free to reboot/diagnose the machine at any time, if both ssds are really busted so be it :(

@fgiunchedi no sign of disk error at my end. I reboot the system and it looks like the system is back up. But i really don't trust HP, I will leave the task open and monitor the server.

Thanks @Papaul !

I also can't find anything obviously wrong after a reboot, tentatively resolving :(

Reopening as per request

Papaul triaged this task as Medium priority.Sep 5 2017, 3:02 PM

This servers looks good so far so resolving the task for now.