Page MenuHomePhabricator

Upgrade firmware on ms-be1021 (Was: Degraded RAID on ms-be1021)
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (md) was detected on host ms-be1021. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-md
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active (auto-read-only) raid1 sda2[0] sdb2[1]
      976320 blocks super 1.2 [2/2] [UU]
      
md0 : active raid1 sda1[0] sdb1[1](F)
      58559488 blocks super 1.2 [2/1] [U_]
      
unused devices: <none>

Event Timeline

jijiki triaged this task as High priority.Jul 2 2019, 11:21 AM
jijiki added subscribers: jijiki, fgiunchedi, Marostegui, jcrespo.

Mentioned in SAL (#wikimedia-operations) [2019-07-02T12:30:21Z] <jijiki> Power cycle ms-be1021 - T227076

I will update its firwmare (like others in T141756) and see what happens

jijiki renamed this task from Degraded RAID on ms-be1021 to Upgrade firmware on ms-be1021 (Was: Degraded RAID on ms-be1021).Jul 4 2019, 12:37 PM
jijiki added a project: serviceops.

Mentioned in SAL (#wikimedia-operations) [2019-07-05T11:38:26Z] <jijiki> Reboot ms-be1021 - T141756 - T227076

jijiki closed this task as Resolved.Jul 5 2019, 11:47 AM
jijiki claimed this task.

There are still messages like

[  122.753602] perf: interrupt took too long (2953 > 2500), lowering kernel.perf_event_max_sample_rate to 67500

Resolving this and keeping an eye