Looks similar to T167121
Active for 29 days so far: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=mw1247&service=IPMI+Sensor+Status
Looks similar to T167121
Active for 29 days so far: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=mw1247&service=IPMI+Sensor+Status
going through:
https://wikitech.wikimedia.org/wiki/Management_Interfaces#Troubleshooting_Commands
No, it's "busy".
[mw1247:~] $ sudo ipmi-chassis --get-chassis-status ipmi_cmd_get_chassis_status: BMC busy
No.
[cumin1001:~] $ sudo ipmitool -I lanplus -H "mw1247.mgmt.eqiad.wmnet" -U root -E chassis power status .. Error: Unable to establish IPMI v2 / RMCP+ session
So chassis_status here is "BMC busy". That is an uncommon error. The common ones are internal system error or driver timeout.
Trying rac reset
@Jclark-ctr Please see if you can reset the DRAC. If the server needs to go down please ping me or somebody else in serviceops to depool it.
and now it's fixed and Icinga is green again on next check. Thanks @Jclark-ctr for the quick response!