Page MenuHomePhabricator

mw1247: IPMI Sensor Status UNKNOWN internal IPMI error
Closed, ResolvedPublic

Event Timeline

ayounsi triaged this task as Medium priority.Nov 6 2019, 7:01 PM
ayounsi created this task.

going through:

https://wikitech.wikimedia.org/wiki/Management_Interfaces#Troubleshooting_Commands

  • Does IPMI works locally?

No, it's "busy".

[mw1247:~] $ sudo ipmi-chassis --get-chassis-status
ipmi_cmd_get_chassis_status: BMC busy
  • Does IPMI work remotely?

No.

[cumin1001:~] $ sudo ipmitool -I lanplus -H "mw1247.mgmt.eqiad.wmnet" -U root -E chassis power status
..
Error: Unable to establish IPMI v2 / RMCP+ session

So chassis_status here is "BMC busy". That is an uncommon error. The common ones are internal system error or driver timeout.

Trying rac reset

Can't ssh to mgmt to do racreset. Needs help from onsite then. adding dcops.

Dzahn lowered the priority of this task from Medium to Low.
Dzahn edited projects, added ops-eqiad; removed observability.

@Jclark-ctr Please see if you can reset the DRAC. If the server needs to go down please ping me or somebody else in serviceops to depool it.

  • ssh to mgmt works again
  • local IPMI works again
  • Icinga check changed to "ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-mw1247.localhost: internal IPMI error"

and now it's fixed and Icinga is green again on next check. Thanks @Jclark-ctr for the quick response!