Page MenuHomePhabricator

Degraded RAID on elastic1046
Open, NormalPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (md) was detected on host elastic1046. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-md
Personalities : [raid1] [raid0] [linear] [multipath] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid0 sda2[0] sdb2[1]
      1503967232 blocks super 1.2 512k chunks
      
md0 : active raid1 sda1[0] sdb1[1](F)
      29279232 blocks super 1.2 [2/1] [U_]
      
unused devices: <none>

Related Objects

StatusAssignedTask
OpenCmjohnson

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2019-07-22T07:54:48Z] <elukey> sudo -i depool on elastic1046 - broken disk (srv partition not available) - T228606

elukey added a subscriber: elukey.Mon, Jul 22, 7:55 AM
elukey@elastic1046:~$ sudo -i depool
Depooling all services on elastic1046.eqiad.wmnet
eqiad/elasticsearch/elasticsearch/elastic1046.eqiad.wmnet: pooled changed yes => no
eqiad/elasticsearch/elasticsearch-psi-ssl/elastic1046.eqiad.wmnet: pooled changed yes => no
eqiad/elasticsearch/elasticsearch-ssl/elastic1046.eqiad.wmnet: pooled changed yes => no
elukey triaged this task as Normal priority.Mon, Jul 22, 8:05 AM
This comment was removed by Cmjohnson.
Volans added a subscriber: Gehel.Tue, Jul 23, 6:43 AM
Cmjohnson reassigned this task from Cmjohnson to wiki_willy.Wed, Jul 24, 5:39 PM
Cmjohnson added a subscriber: wiki_willy.

This server is out of warranty, ended April 2019. @wiki_willy escalating to you to decide on disks

@elukey - since elastic1046 is just barely out of warranty (only by a few months), we'll still have to purchase a new disk for this server. Just double-checking that's the route you want to go, before we place the order.

Thanks,
Willy

elukey added a subscriber: dcausse.Thu, Jul 25, 8:25 AM

Adding @dcausse to the conversation since @Gehel is on holiday. I would simply buy the disk now, but not sure if elastic1046 is scheduled to be refreshed soon.

The host is not scheduled for replacement, @wiki_willy please proceed with the order of the disk :)

jijiki added a subscriber: jijiki.Thu, Jul 25, 11:56 AM

Thanks @elukey, subtask #T229017 has been opened to order the replacement drive with procurement. Assigning this task back to @Cmjohnson, for when the disk arrives onsite.

Thanks,
Willy