Page MenuHomePhabricator

Upgrade nic firmware on cloudvirt1023
Closed, ResolvedPublic

Description

While investigating T269313 I checked the firmware versions of all the cloudvirts. On Dell systems we seem to be standardized on two versions, /except/ for cloudvirt1023 which seems to have an ancient, different version.

Let's try to understand that and possibly upgrade.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

1023 was purchased at the same time as 1024. Here is 1024:

node_nic_firmware_version{cluster="wmcs", device="ens3f0np0", driver="bnxt_en", firmware_version="214.4.32.0/pkg 21.60.22.11", instance="cloudvirt1024:9100", job="node", site="eqiad"}

here is 1023:

node_nic_firmware_version{cluster="wmcs", device="ens3f0np0", driver="bnxt_en", firmware_version="20.6.151.0", instance="cloudvirt1023:9100", job="node", site="eqiad"}

Mentioned in SAL (#wikimedia-cloud) [2020-12-04T22:23:59Z] <andrewbogott> moving cloudvirt1023 out of the ceph aggregate and into maintenance for T269467

This is now out of service and can be upgraded whenever.

It turns out this has a weird firmware version because it's weird hardware

robh: this is a qlogic 41112 sfp adapter which is not quite the same as others

Given that it isn't broken, let's leave it as it is.

Mentioned in SAL (#wikimedia-cloud) [2020-12-04T22:33:54Z] <andrewbogott> moving cloudvirt1023 back into the ceph aggregate; it doesn't need upgrades after all T269467

faidon subscribed.

It turns out this has a weird firmware version because it's weird hardware

robh: this is a qlogic 41112 sfp adapter which is not quite the same as others

Given that it isn't broken, let's leave it as it is.

It was indeed ordered originally with odd HW, the QLogic 41112, but we replaced those a long time ago with standard Broadcom 57412s (T203827). So these actually have the same NICs (57412) as the other cloudvirts - hence reopening!

I'm leaving this drained and out of service in the meantime.

Mentioned in SAL (#wikimedia-cloud) [2020-12-05T00:35:35Z] <andrewbogott> moving cloudvirt1023 back into maintenance because T269467 continues to puzzle

RobH subscribed.

updated to latest 21.65.33.33 and booted back into os with network online

Mentioned in SAL (#wikimedia-cloud) [2020-12-07T18:33:50Z] <andrewbogott> putting cloudvirt1023 back into service T269467

confirmed:

root@cloudvirt1023:~# ethtool -i ens3f1np1 | grep firmware
firmware-version: 214.4.91.1/pkg 21.65.33.33

I've put this host back in service.