We have some check_nrpe based service checks defined (check_raid_hpssacli for instance) with long 90s timeouts. However, it looks like the nrpe service is running with a default command_timeout value of 60s. Possibly originating from the deb package?
This is causing some raid checks to timeout, despite being called with a sufficient -t value. For example, here is check_raid_hpssacli being called from the icinga server with a timeout of 90s
einsteinium:~# /usr/lib/nagios/plugins/check_nrpe -H ms-be1030.eqiad.wmnet -c check_raid_hpssacli -t 90 NRPE: Command timed out after 60 seconds
This check is taking ~65s to complete locally
ms-be1030:~# time /usr/local/lib/nagios/plugins/check_hpssacli OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK real 1m5.345s user 0m36.320s sys 0m3.796s