Page MenuHomePhabricator

Installation issues on PowerEdge R440 Restbase servers with buster / firmware update needed
Closed, ResolvedPublic

Description

Some restbase hosts appear to be suffering the same issues encountered in T297422/T296856 and cannot boot correctly into the installer.

As it stands the nodes we've seen this with so far are

  • restbase2017
  • restbase2019
  • restbase2020
  • restbase1019
  • restbase1020
  • restbase1021
  • restbase1022
  • restbase1023
  • restbase1024
  • restbase1025
  • restbase1026
  • restbase1027

Event Timeline

Can i just update the nic firmware or does this need scheduled downtime?

@hnowlan Can i just update the nic firmware or does this need scheduled downtime?

@hnowlan Can i just update the nic firmware or does this need scheduled downtime?

It seems that T296856 indicates that the BIOS itself and iDRAC firmware need to be upgraded so I assume this will at least require a reboot. I can downtime these hosts when they are being upgraded if you let me know when.

Ideally hosts in a DC would not be all taken down at the same time but done sequentially, would that be possible?

@hnowlan Yes, we can do them in any order you see fit. I would like to trial run one first to make sure the idrac update works. The latest update has been causing issues with accessing the mgmt interface through a web portal. I use that portal to update firmware and get logs for Dell when there are issues.

@hnowlan Yes, we can do them in any order you see fit. I would like to trial run one first to make sure the idrac update works. The latest update has been causing issues with accessing the mgmt interface through a web portal. I use that portal to update firmware and get logs for Dell when there are issues.

Sounds good, take your pick of any of the listed hosts and I can verify once the upgrade is in place.

lets go with restbase1019 @hnowlan

Sounds good - let me know whenever suits and I can handle the downtimes etc. The host itself should be pretty hardy against sudden disappearances

@hnowlan the BIOS and network firmware have been updated on restbase1019. The current idrac is too old to update, my oldest version 5.0 is still not old enough to update.

Mentioned in SAL (#wikimedia-operations) [2022-01-27T17:01:15Z] <cmjohnson1> updating firmware restbase1020 T299652

Mentioned in SAL (#wikimedia-operations) [2022-01-27T17:21:43Z] <cmjohnson1> updating firmware restbase1021 T299652

Cmjohnson edited projects, added DC-Ops; removed ops-eqiad.
Cmjohnson added a subscriber: Cmjohnson.

assigned this to @Papaul for codfw portion of the task, removed the ops-eqiad.

Please power down the servers and let me now when this is done

Papaul triaged this task as Medium priority.Jan 31 2022, 4:35 PM

Please power down the servers and let me now when this is done

Ideally I'd like to do this host by host. Restbase2017 is down now.

Papaul updated the task description. (Show Details)
Cmjohnson updated the task description. (Show Details)
Cmjohnson added a subscriber: Papaul.

updated 2019 and 2020, resolving