Page MenuHomePhabricator

Broadcom NICs with recent firmware fail to reimage
Closed, ResolvedPublic

Description

Creating this task to try and track the current situation we find ourselves in with the 10/25G Broadcom NICs and decide how to move forward.

Much of the previous discussion happened on tasks related to specific hosts which were closed when we found a work-around. For reference some of these tasks include:

T286722: Broadcom BCM57412 10G NIC and Bullseye installer
T350179: Reimage cookbook on new eqiad hosts stuck at PXE booting
T304483: PXE boot NIC firmware regression

Doing a quick audit of our estate I can see we have the following 10G and 25G (SFP+ / SFP28) based NICs:

BCM57412 rev 01:
    descr: Broadcom Inc. BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
    brand: NetXtreme-E
    speed: 10Gb
    hosts: 818

BCM57414 rev 01:
    descr: Broadcom Inc. BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
    brand: NetXtreme-E
    speed: 10Gb/25Gb
    hosts: 174

BCM57810 rev 10:
    descr: Broadcom Inc. NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
    brand: NetXtreme II
    speed: 10Gb
    hosts: 34

The current status of each, as I understand it its:

BCM57412

This is our most common 10G card. As I understand it we've seen two issues with this:

1] When we reimage the PXEboot process works fine, the system brings the NIC up and does DHCP, downloads the debian image and boots into the installer. But with Bullseye, as described in T286722, when Debian/Linux has loaded the NIC does not bring the port back up, and thus the system cannot get an IP address to download packages and complete the install.

Given this happens within the Debian installer environment, but not at the PXEboot stage, the driver being used/kernel version is one of the factors we need to consider.

2] The PXEboot DHCP step works ok, but the system fails to load Linux from the tftp server, reporting Failed to load ldlinux.c32, and the system does not reach the Debain installer environment (T304483).

It's not 100% clear to me in this scenario if the failure is completely within the on-board PXEboot system, or if it's managed to load some stuff over tftp which may play a role.

In both of these cases the solution to this is to make sure the card is using firmware version 21.85.21.92.

BCM57414

This is the NIC we have for any systems connected at 25G. It has mostly worked ok for us, but we discovered in recent task T350179 that when a 10G / SFP+ module is used in the SFP28 port it will fail to send the DHCP request during PXEboot. The port does come up on the switch side, and (afaik) system says it is trying DHCP, but no DHCP packet is sent to the switch. This problem is further complicated by the fact it's not consistent. It mostly occurs, but experience has shown if multiple attempts are made it will generally work 1 out of every 4 or 5 tries.

Given this issue occurs at the PXEboot stage, all software/firmware etc. involved is on-board the system. And thus the problem lies squarely within Dell's remit. It seems to me we should be raising this with Dell and trying to get them to a fix (let them deal with the other vendors). Has any progress been made on that (I couldn't find a task)?

The fix to that problem was discovered to be downgrade firmware to version 21.60.22.11.

BCM57810

This appears to be a different 10G card model, branded NetXtreme II rather than NetXtreme-E. We only have a small number of older hosts with this card.

From the previous tickets it's not clear to me if we've observed issues with this card, or have any specific known 'good' or 'bad' firmware revisions for it. Does anyone have any specific knowledge on this one?

Next Steps

Things are largely ok I guess, we have known-good firmware versions we can load which overcome the issues for all variants. I mostly wanted to open this task to list out the different cards we have and the issues we've seen, plus the known-good firmware versions.

If we are going to go back to Dell we can use this task to track that. Otherwise, if we are happy using the known-good firmware maybe we can just say that and close it.

Event Timeline

cmooney triaged this task as Medium priority.

@wiki_willy I did more tests on this pxe boot issue we are having with the 10G Dell NIC card by taking one of the decommissioned server we have and putting a 10G NIC card inside. I connected the server into my lab and pxe boot the server with the Firmware @ version 22.21.06.80

Broadcom Adv. Dual 10Gb Ethernet - 00:0A:F7:F0:0C:10	22.21.06.80
Broadcom Adv. Dual 10Gb Ethernet - 00:0A:F7:F0:0C:11	22.21.06.80

The server was able to pxe boot without an issue: below out put from my install server

Jul 11 22:52:34 install1001 dhcpd[3408477]: DHCPDISCOVER from 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 11 22:52:34 install1001 dhcpd[3408477]: DHCPOFFER on 10.192.64.21 to 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 11 22:52:38 install1001 dhcpd[3408477]: DHCPREQUEST for 10.192.64.21 (10.192.16.5) from 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 11 22:52:38 install1001 dhcpd[3408477]: DHCPACK on 10.192.64.21 to 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 11 22:52:38 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/pxelinux.0 to 10.192.64.21:2070
Jul 11 22:52:38 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/pxelinux.0 to 10.192.64.21:2071
Jul 11 22:52:38 install1001 atftpd[529]: Serving bullseye-installer/ldlinux.c32 to 10.192.64.21:49152
Jul 11 22:52:38 install1001 atftpd[529]: Serving bullseye-installer/pxelinux.cfg/ttys1-115200 to 10.192.64.21:49153
Jul 11 22:52:38 install1001 atftpd[529]: Serving bullseye-installer/pxelinux.cfg/boot.txt to 10.192.64.21:49154
Jul 11 22:52:48 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/linux to 10.192.64.21:49155
Jul 11 22:52:49 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/initrd.gz to 10.192.64.21:49156

what I saw also on the console of the server while booting up was in my lab environment I am using "PXELINUX 6.04" or in WMF environment we are using "PXELINUX 6.03". I will have to double check this tomorrow when I am back online.

Screenshot from 2024-07-11 22-52-47.png (524×1 px, 140 KB)

What I did next was to upgrade the firmware to the latest version 22.92 and try to pxe boot on the latest version.

Broadcom Adv. Dual 10Gb Ethernet - 00:0A:F7:F0:0C:10	22.92.06.10
Broadcom Adv. Dual 10Gb Ethernet - 00:0A:F7:F0:0C:11	22.92.06.10

With the latest version, the server did boot without an issue

Jul 11 23:38:01 install1001 dhcpd[3408477]: DHCPDISCOVER from 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 11 23:38:01 install1001 dhcpd[3408477]: DHCPOFFER on 10.192.64.21 to 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 11 23:38:05 install1001 dhcpd[3408477]: DHCPREQUEST for 10.192.64.21 (10.192.16.5) from 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 11 23:38:05 install1001 dhcpd[3408477]: DHCPACK on 10.192.64.21 to 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 11 23:38:05 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/pxelinux.0 to 10.192.64.21:2070
Jul 11 23:38:05 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/pxelinux.0 to 10.192.64.21:2071
Jul 11 23:38:05 install1001 atftpd[529]: Serving bullseye-installer/ldlinux.c32 to 10.192.64.21:49152
Jul 11 23:38:05 install1001 atftpd[529]: Serving bullseye-installer/pxelinux.cfg/ttys1-115200 to 10.192.64.21:49153
Jul 11 23:38:05 install1001 atftpd[529]: Serving bullseye-installer/pxelinux.cfg/boot.txt to 10.192.64.21:49154
Jul 11 23:38:15 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/linux to 10.192.64.21:49155
Jul 11 23:38:16 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/initrd.gz to 10.192.64.21:49156

Screenshot from 2024-07-11 23-38-15.png (524×1 px, 136 KB)

Conclusion: After all those testings, I think the problem is not the firmware, the problem is within the WMF environment. This explain also why the Supermicro servers using 10G NIC's are not about to pxe boot. If we find out what is causing the issue in WMF environment both the Dell server and Supermicro servers using 10G will be able to pxe boot without we downgrading the firmware. Like i said I will double check again the PXELINUX version WMF is using tomorrow.
Thank you.

I checked on sretest2001 it's trying to boot with PXELINUX version 6.03

Thanks for testing this out @Papaul. Since it appears that upgrading the WMF environment to PXELINUX version 6.04 may fix this issue, who would be the best person to help us get that upgraded?

Thanks,
Willy

I did some more testing this weekend by downgrading my PXELINUX to 6.03 see below . I was still able to pxeboot.

Screenshot from 2024-07-14 11-59-55.png (708×1 px, 181 KB)

Jul 14 11:54:10 install1001 dhcpd[3408477]: DHCPDISCOVER from 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 14 11:54:10 install1001 dhcpd[3408477]: DHCPOFFER on 10.192.64.21 to 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 14 11:54:10 install1001 dhcpd[3408477]: DHCPREQUEST for 10.192.64.21 (10.192.16.5) from 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 14 11:54:10 install1001 dhcpd[3408477]: DHCPACK on 10.192.64.21 to 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 14 11:59:29 install1001 dhcpd[3408477]: DHCPDISCOVER from 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 14 11:59:29 install1001 dhcpd[3408477]: DHCPOFFER on 10.192.64.21 to 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 14 11:59:33 install1001 dhcpd[3408477]: DHCPREQUEST for 10.192.64.21 (10.192.16.5) from 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 14 11:59:33 install1001 dhcpd[3408477]: DHCPACK on 10.192.64.21 to 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 14 11:59:33 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/pxelinux.0 to 10.192.64.21:2070
Jul 14 11:59:33 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/pxelinux.0 to 10.192.64.21:2071
Jul 14 11:59:33 install1001 atftpd[529]: Serving bullseye-installer/ldlinux.c32 to 10.192.64.21:49152
Jul 14 11:59:33 install1001 atftpd[529]: Serving bullseye-installer/pxelinux.cfg/ttys1-115200 to 10.192.64.21:49153
Jul 14 11:59:33 install1001 atftpd[529]: Serving bullseye-installer/pxelinux.cfg/boot.txt to 10.192.64.21:49154
Jul 14 11:59:44 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/linux to 10.192.64.21:49155
Jul 14 11:59:44 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/initrd.gz to 10.192.64.21:4

As FYI we already have T367970 to upgrade pxelinux to 6.04, but IIRC we already manually tested it and it didn't fix the issue (that seems to corroborate what Papaul reported above).

After a chat with Papaul, we would like to test if the Juniper DHCP injection to implement Option 82 could cause any of this (basically to rule out the variable). This is the test that we have in mind:

  1. We create a snippet like T365372#9892336 but for the primary interface of the target host.
  2. We deploy the snipped to installXXXX and reload the dhcp server.
  3. We disable the Juniper Option 82 injection in the switch of the target host.
  4. We try to PXE Boot.

If we pick sretest2001, this would be the config:

elukey@install2004:/etc/dhcp/automation/proxies$ cat ttyS1-115200.conf
# Automatically generated by dhcpincludes for /etc/dhcp/automation/ttyS1-115200

host sretest2001 {
    hardware ethernet 7c:c2:55:97:5a:0e;
    fixed-address 10.192.21.8;
    option pxelinux.pathprefix "http://apt.wikimedia.org/tftpboot/bookworm-installer/";
}

That's a great idea ! Happy to help if needed.
fixed-address sretest2001.codfw.wmnet; this needs to be fixed-address $some-ip-address;

I did more testing today again

  • I downloand the lpxelinux file we have on apt.wikimedia and copy it to my tftp node
  • modify dhcpd.conf file to use the lpelinux

Screenshot from 2024-07-18 20-47-50.png (407×1 px, 62 KB)

Screenshot from 2024-07-18 20-49-04.png (405×1 px, 45 KB)

  • reboot the server to pxe boot I got the same error we had this morning

Screenshot from 2024-07-18 20-08-27.png (1×1 px, 206 KB)

This is awesome Papaul! I tried various configs for sretest2001 (this is a Supermicro node, not Dell):

host sretest2001 {
    hardware ethernet 7c:c2:55:97:5a:0e;
    fixed-address 10.192.21.8;
    filename "bookworm-installer/debian-installer/amd64/pxelinux.0"
    option pxelinux.pathprefix "http://apt.wikimedia.org/tftpboot/bookworm-installer/";
}

host sretest2001 {
    hardware ethernet 7c:c2:55:97:5a:0e;
    fixed-address 10.192.21.8;
    filename "bookworm-installer/pxelinux.0"
    option pxelinux.pathprefix "http://apt.wikimedia.org/tftpboot/bookworm-installer/";
}

....

All the above were tests keeping option pxelinux.pathprefix with http://, and they all ended up with the same problem. The I tried the following:

elukey@install2004:~$ cat /etc/dhcp/automation/ttyS1-115200/sretest2001.conf

host sretest2001 {
    host-identifier option agent.circuit-id "lsw1-b7-codfw:xe-0/0/35.0:private1-b7-codfw";
    fixed-address 10.192.21.8;
    filename "/srv/tftpboot/bookworm-installer/pxelinux.0";
    option pxelinux.pathprefix "/srv/tftpboot/bookworm-installer/";
}

The above, IIUC, would use install2004 for everything, rather that getting configs via HTTP from the apt nodes (and using TFTP as protocol). I was able to see the Debian Installer with this config.

I also tried to remove from the "working" config the filename setting (reverting back to the default lpxelinux.0), and afaics it gets back to failing with the usual error.

Some questions raised on IRC's dcops chan:

  • Is it a problem with lpxelinux.0, the NIC firmwares interacting with it (say using HTTP etc..) or both?
  • Should we have an interim way to force TFTP-only PXE boot configurations via Spicerack/reimage-cookbook? Like adding an option to force the current "working" dhcp config in case of troubles.
  • Alternatively, should we consider to switch our dhcpd config to force pxelinux.0 and tftp-only?

Amazing progress !

Is it a problem with lpxelinux.0, the NIC firmwares interacting with it (say using HTTP etc..) or both?

Good question, I guess we could A/B test it by loading lpxelinux.0 over TFTP. I'm not familiar with the differences between lpxelinux.0 and pxelinux.0

Should we have an interim way to force TFTP-only PXE boot configurations via Spicerack/reimage-cookbook? Like adding an option to force the current "working" dhcp config in case of troubles.

Sounds like a good idea to fix the immediate issue, as long as we make sure to find a long term fix as well

Alternatively, should we consider to switch our dhcpd config to force pxelinux.0 and tftp-only?

iirc, DCops decided to move away from TFTP towards HTTP a while ago because of performances issues (re-imaging was taking ages over TFTP). So moving back to TFTP doesn't seems like a good long term option. Unless that was from a time before we had an install server in each site ?

@wiki_willy I did more tests on this pxe boot issue we are having with the 10G Dell NIC

@Papaul can you confirm what model Broadcom card we are testing with here for clarity? Or let me know the host it was done on and I'll check. A large part of the reason I created this task was to make sure we track those as we've some different 10G models in use.

iirc, DCops decided to move away from TFTP towards HTTP a while ago because of performances issues (re-imaging was taking ages over TFTP). So moving back to TFTP doesn't seems like a good long term option. Unless that was from a time before we had an install server in each site ?

TFTP is usually quite slow in my experience, but tbh I can't think of a protocol-level reason that would be the case, probably just how its written (ancient code, small buffers, not doing zcopy or something who knows.)

I also think it's important we keep track of the Broadcom NIC models we are using in the SuperMicro's for clarity and finding any patterns in future. I believe we are using BCM57414 in them? Do we know the firmware version installed? That's the one we needed 21.60.22.11 for the Dells previously - with a 10G SFP+ module - although 21.85 I think worked when an SFP28 was used.

Amazing progress !

Is it a problem with lpxelinux.0, the NIC firmwares interacting with it (say using HTTP etc..) or both?

Good question, I guess we could A/B test it by loading lpxelinux.0 over TFTP. I'm not familiar with the differences between lpxelinux.0 and pxelinux.0

Already tested, it fails as well. IIUC from a chat with Tobias the code is very different, even when using TFTP (one expects it would be the same but nope..).

Should we have an interim way to force TFTP-only PXE boot configurations via Spicerack/reimage-cookbook? Like adding an option to force the current "working" dhcp config in case of troubles.

Sounds like a good idea to fix the immediate issue, as long as we make sure to find a long term fix as well

Yep exactly, maybe migrating to EFI could be explored as long term option?

Alternatively, should we consider to switch our dhcpd config to force pxelinux.0 and tftp-only?

iirc, DCops decided to move away from TFTP towards HTTP a while ago because of performances issues (re-imaging was taking ages over TFTP). So moving back to TFTP doesn't seems like a good long term option. Unless that was from a time before we had an install server in each site ?

TFTP is usually quite slow in my experience, but tbh I can't think of a protocol-level reason that would be the case, probably just how its written (ancient code, small buffers, not doing zcopy or something who knows.)

It would definitely be a little slower, but now we have local installXXXX nodes in every pop, so in theory the price to pay shouldn't be that high.

Next steps:

  • Immediate: I/F is going to add code to Spicerack and the reimage cookbook to force a tftp-only boot, so we'll be able to have a workaround for any server showing up the Failed to load ldlinux.c32 symptoms.
  • Medium/Long: I/F is going to work on EFI probably in September when the whole team is back (including Moritz), so we'll be able to give a more detailed planning about it (if we'll decide to pursue this road).

Change #1056176 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/software/spicerack@master] dhcp: add dhcp_filename and dhcp_options for DHCPConfMac and DHCPConfOpt82

https://gerrit.wikimedia.org/r/1056176

Change #1056176 merged by jenkins-bot:

[operations/software/spicerack@master] dhcp: add dhcp_filename and dhcp_options for DHCPConfMac and DHCPConfOpt82

https://gerrit.wikimedia.org/r/1056176

Change #1056534 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/cookbooks@master] sre.hosts.reimage: add workaround for PXE boot issue on some NICs

https://gerrit.wikimedia.org/r/1056534

Change #1056534 merged by jenkins-bot:

[operations/cookbooks@master] sre.hosts.reimage: add workaround for PXE boot issue on some NICs

https://gerrit.wikimedia.org/r/1056534

Next steps:

  • Immediate: I/F is going to add code to Spicerack and the reimage cookbook to force a tftp-only boot, so we'll be able to have a workaround for any server showing up the Failed to load ldlinux.c32 symptoms.

The reimage cookbook now offers the flag --force-dhcp-tftp to force pxelinux.0 and tftp only (no http). Tested it today with srestest1001, everything worked fine.

Papaul claimed this task.

Since we know now what the issue is and we have a fix I am closing this task but feel free to open if you have any questions.

Thanks

Change #1092802 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/cookbooks@master] sre.hosts.{dhcp,reimage}: force tftp as default option

https://gerrit.wikimedia.org/r/1092802

Change #1092802 merged by Elukey:

[operations/cookbooks@master] sre.hosts.{dhcp,reimage}: force tftp as default option

https://gerrit.wikimedia.org/r/1092802