Page MenuHomePhabricator

mw1247: IPMI Sensor Status UNKNOWN internal IPMI error
Closed, ResolvedPublic

Event Timeline

ayounsi triaged this task as Medium priority.Nov 6 2019, 7:01 PM
ayounsi created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 6 2019, 7:01 PM
Dzahn added a subscriber: Dzahn.Nov 6 2019, 7:19 PM

going through:

https://wikitech.wikimedia.org/wiki/Management_Interfaces#Troubleshooting_Commands

  • Does IPMI works locally?

No, it's "busy".

[mw1247:~] $ sudo ipmi-chassis --get-chassis-status
ipmi_cmd_get_chassis_status: BMC busy
  • Does IPMI work remotely?

No.

[cumin1001:~] $ sudo ipmitool -I lanplus -H "mw1247.mgmt.eqiad.wmnet" -U root -E chassis power status
..
Error: Unable to establish IPMI v2 / RMCP+ session

So chassis_status here is "BMC busy". That is an uncommon error. The common ones are internal system error or driver timeout.

Trying rac reset

Dzahn added a comment.Nov 6 2019, 7:21 PM

Can't ssh to mgmt to do racreset. Needs help from onsite then. adding dcops.

Dzahn assigned this task to Jclark-ctr.Nov 6 2019, 7:22 PM
Dzahn lowered the priority of this task from Medium to Low.
Dzahn edited projects, added ops-eqiad; removed observability.

@Jclark-ctr Please see if you can reset the DRAC. If the server needs to go down please ping me or somebody else in serviceops to depool it.

Dzahn added a comment.Nov 6 2019, 9:24 PM

server is depooled now.

preformed flea power drain

Dzahn added a comment.Nov 6 2019, 9:59 PM
  • ssh to mgmt works again
  • local IPMI works again
  • Icinga check changed to "ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-mw1247.localhost: internal IPMI error"
Dzahn closed this task as Resolved.Nov 6 2019, 10:02 PM

and now it's fixed and Icinga is green again on next check. Thanks @Jclark-ctr for the quick response!