Degraded RAID on ganeti2013
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	ops-monitoring-bot
	Nov 16 2022, 4:53 PM

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (md) was detected on host ganeti2013. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: State: degraded, Active: 10, Working: 10, Failed: 0, Spare: 0

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-md
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] 
md0 : active raid5 sdc1[2] sda1[0] sdd1[3] sdb1[1]
      1456128 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      
md1 : active raid5 sda2[0] sdd2[3] sdb2[1]
      117086208 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UU_U]
      
md2 : active raid5 sdd3[3] sda3[0] sdb3[1]
      2225184768 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UU_U]
      bitmap: 1/6 pages [4KB], 65536KB chunk

unused devices: <none>

Related Objects

Mentioned In: T323220: Broken disk on ganeti2013

Event Timeline

ops-monitoring-bot created this task.Nov 16 2022, 4:53 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 16 2022, 4:53 PM

Dzahn mentioned this in T323220: Broken disk on ganeti2013.Nov 16 2022, 9:58 PM

Papaul moved this task from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.Nov 17 2022, 1:58 AM

VM kubestagetcd2002.codfw.wmnet switching disk type to drbd

VM kubestagetcd2002.codfw.wmnet switching disk type to plain

The server can be taken down for troubleshooting anytime, I removed it from active service. I saw kernel messages on the console pointint to a broken /dev/sdc.

I realise the server is out of warranty for some months now, but let's either use a disk from a decommed server (if we have one) or buy a replacement?

Dzahn merged a task: T323220: Broken disk on ganeti2013.Nov 21 2022, 6:40 PM

Dzahn subscribed.

Marostegui assigned this task to Papaul.Nov 28 2022, 9:19 AM

@MoritzMuehlenhoff unfortunately this server is out of warranty.