Degraded RAID on analytics1059
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	ops-monitoring-bot
	Mar 7 2021, 5:27 AM

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (megacli) was detected on host analytics1059. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: 1 failed LD(s) (Offline)

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-megacli
Failed to execute '['/usr/lib/nagios/plugins/check_nrpe', '-4', '-H', 'analytics1059', '-c', 'get_raid_status_megacli']': RETCODE: 2
STDOUT:
b'CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds.\n'
STDERR:
None

Event Timeline

ops-monitoring-bot created this task.Mar 7 2021, 5:27 AM

Peachey88 added a project: Analytics.Mar 7 2021, 5:35 AM

Mentioned in SAL (#wikimedia-analytics) [2021-03-07T07:49:32Z] <elukey> umount /var/lib/hadoop/data/e on analytics1059 and restart hadoop daemons to exclude failed disk - T276696