Page MenuHomePhabricator

hw troubleshooting: ipmi down for wdqs1005.eqiad.wmnet
Closed, ResolvedPublicRequest

Description

  • FQDN: wdqs1005.eqiad.wmnet
  • Depooled / alerts suppressed
  • Have put system into a failed state in Netbox.
  • Medium-High => This host is due for decom, but we can't run the decom cookbook due to the state IPMI is in, so fixing this will unblock that decom
  • Assigned correct project tag and appropriate owner
Error example
ryankemper@cumin1001:~$ sudo -E cookbook sre.hosts.decommission -t T344198 wdqs1005.eqiad.wmnet
Management Password:
Running IPMI command: ipmitool -I lanplus -H wdqs1005.mgmt.eqiad.wmnet -U root -E chassis power status
Error: Unable to establish IPMI v2 / RMCP+ session
==> WARNING: remote IPMI connection test failed for host wdqs1005. The host will not be shutdown. You can either continue (go) as is or try to fix the problem first (abort). See https://wikitech.wikimedia.org/wiki/Ipmi for troubleshooting.

Event Timeline

RKemper renamed this task from decommission wdqs1005.eqiad.wmnet to hw troubleshooting: ipmi down for wdqs1005.eqiad.wmnet.Aug 28 2023, 10:11 PM
RKemper assigned this task to Papaul.
RKemper updated the task description. (Show Details)
RKemper added a project: ops-eqiad.
RKemper subscribed.

@Jclark-ctr @VRiley-WMF can someone please check the mgmt cable for this servers, I can not ping the mgmt IP if the cable is good, can someone please reset the IDRAC?

Thanks

Jclark-ctr claimed this task.
Jclark-ctr added a subscriber: Papaul.

performed flea power drain idrac connection came back