Page MenuHomePhabricator

HP RAID (Service Check Timed Out) on swift hosts
Closed, DuplicatePublic

Description

ms-be1030 - HP RAID - UNKNOWN - Service Check Timed Out)

Issue seems to be that the HP RAID checks takes longer to run than the Icinga timeout:

ayounsi@ms-be1030:~$ time /usr/local/lib/nagios/plugins/check_hpssacli
OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK

real	1m9.053s
user	0m35.384s
sys	0m4.060s

Event Timeline

ayounsi created this task.Aug 7 2017, 4:50 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 7 2017, 4:50 PM
Dzahn added a subscriber: Dzahn.Aug 7 2017, 4:52 PM

nrpe::monitor_service has parameter "timeout" for this.

example:

modules/role/manifests/mail/mx.pp

 nrpe::monitor_service { 'check_exim_queue':
..
nrpe_command   => '/usr/local/lib/nagios/plugins/check_exim_queue -w 1000 -c 3000',
..
timeout        => 20,
RobH added a subscriber: RobH.Aug 7 2017, 4:58 PM

Since this has been showing unknown sporadically, I think adding an additional minute is a good idea.

Change 370505 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] Bumping HP RAID Icinga check timeout from 60 to 90s

https://gerrit.wikimedia.org/r/370505

Change 370505 merged by Ayounsi:
[operations/puppet@production] Bumping HP RAID Icinga check timeout from 60 to 90s

https://gerrit.wikimedia.org/r/370505

ayounsi closed this task as Resolved.Aug 17 2017, 10:43 PM
Dzahn reopened this task as Open.Mar 15 2019, 9:59 AM

This is apparently an issue again. See screenshot below from today:

Dzahn removed ayounsi as the assignee of this task.Mar 15 2019, 10:00 AM
Dzahn edited projects, added Operations; removed Patch-For-Review.

HP RAID checks are timing out on all eqiad swift hosts.

Dzahn renamed this task from HP RAID (Service Check Timed Out) to HP RAID (Service Check Timed Out) on swift hosts.Mar 15 2019, 10:01 AM
Dzahn added a project: SRE-swift-storage.

Given that this is quite old I'm closing it as duplicate of T210723 that has a more recent discussion of possible solutions. (CC @colewhite )