Page MenuHomePhabricator

hw troubleshooting: wdqs1010 unreachable from SSH or DRAC
Closed, ResolvedPublic

Description

Hello DC Ops!

Per IRC conversation with @Papaul , I am unable to reach wdqs1010 .

This started yesterday when I attempted a reimage. The action failed, but it left the server in a state where I could get an SSH response, but not login.

DRAC connectivity has been off and on. When I can get in, I see a Linux server prompt on the console, but I can't login using the root password from the pws repo. I've tried to reimage a few times, but it always fails with IPMI errors.

The firmware cookbook won't update the host, because it says the DRAC's firmware is too old. I attempted to update the firmware manually via the web interface, but that fails as well, with the error Unable to extract payloads from Update Package. I've tried 4 different firmware versions so far, using the EXE file labeled "An Application" on the Dell website . No luck so far.

Please take a look at this host at your convenience; we have enough a capacity so that this is not a major issue.


Priority => Medium

  • Host is depooled
  • Host marked as failed in Netbox

Event Timeline

Change 952278 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs: disable alerts on wdqs1010

https://gerrit.wikimedia.org/r/952278

Change 952278 merged by Ryan Kemper:

[operations/puppet@production] wdqs: disable alerts on wdqs1010

https://gerrit.wikimedia.org/r/952278

RKemper updated the task description. (Show Details)
RKemper renamed this task from wdqs1010 unreachable from SSH or DRAC to hw troubleshooting: wdqs1010 unreachable from SSH or DRAC.Aug 30 2023, 6:27 PM

@bking IDRAC and BIOS updated. All yours. As for 10/03/2023 the latest IDRAC version for R430 is iDRAC 2.84.84.84

Change 954093 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] wdqs: re-enable alerts on wdqs1010

https://gerrit.wikimedia.org/r/954093