Page MenuHomePhabricator

labvirt1008/labsdb1001: FreeIPMI returned an empty header map
Closed, ResolvedPublic

Description

The icinga check check_ipmi_temp is not working as expected on labvirt1008 and labsdb1001.

/usr/local/lib/nagios/plugins/check_ipmi_sensor --noentityabsent -T Temperature -ST Temperature --nosel -vvv
------------- debug output for sensors (-vvv is set): ------------
  script was executed with the following parameters:
    /usr/local/lib/nagios/plugins/check_ipmi_sensor --noentityabsent -T Temperature -ST Temperature --nosel -vvv
  check_ipmi_sensor version:
    3.11
  FreeIPMI version:
    ipmi-sensors - 1.1.5
  FreeIPMI was executed with the following parameters:
    /usr/sbin/ipmi-sensors -g Temperature --quiet-cache --sdr-cache-recreate --interpret-oem-data --output-sensor-state --ignore-not-available-sensors
  FreeIPMI return code: 0
  output of FreeIPMI:

--------------------- end of debug output ---------------------
Sensor Type(s) Temperature Status: 
 FreeIPMI returned an empty header map (first line) FreeIPMI could not find any sensors for the given sensor type (option '-T').

When running the FreeIPMI command mentioned above, no output is produced by ipmi-sensors. This seems to be the reason why the check is failing.

root@labvirt1008:~# /usr/sbin/ipmi-sensors -g Temperature --quiet-cache --sdr-cache-recreate --interpret-oem-data --output-sensor-state --ignore-not-available-sensors 
root@labvirt1008:~#

Running the same command with --debug produces some very verbose output, the last line of which is interesting:

Sensor reading/event bitmask not available: sensor reading unavailable

Event Timeline

faidon claimed this task.
faidon added a subscriber: jcrespo.

labsdb1001 is one of two Ciscos remaining in our fleet (labsdb1003 being the other one). They're old and their BIOS/IPMI implementation has always been horrible. They are also unresponsive on their mgmt IP interface entirely (T169360) and @jcrespo has mentioned that he's not even sure if they'll come back up after we reboot them(!). Their replacement has been requested for a while (T142807) and it will hopefully happen at some point -- not much we can do about this task until then, I'd say.

labvirt1008, I just fixed by resetting its iLO interface (cd /map1 and then reset).