Page MenuHomePhabricator

Test 1G NIC compatibility, default to TFTP in sre.hosts.reimage cookbook
Closed, ResolvedPublic

Description

The IF team did some great troubleshooting for the constant Broadcom PXE problems in T363576 , culminating in @elukey 's patch that adds TFTP support to the reimage cookbook , which is required to reimage any 10G host with firmware > 21.60.22.11 . (This particular firmware version is ~ 5 years old as I write this. )

Unfortunately, as it stands right now, the user has to be aware of the flag's existence to successfully complete a reimage. Since most of our hosts are 10G , I think it makes sense to set TFTP as the default option. @Papaul raised a legitimate concern about whether or not TFTP would cause problems with 1G NICs, so I'm creating the ticket with the following AC:

  • Test TFTP PXE boot with 1GB NICs (I'm pretty sure this will work, just with reduced transfer speeds)
  • If it works, set TFTP as defaultº

Thanks for looking and please let me know if I can do anything else to help!

ºIf it doesn't work, I would still recommend defaulting to TFTP as it has the most benefits for the most hosts, but maybe add a warning or automatically reset to the HTTP-based option for 1G NICs.

Event Timeline

Volans triaged this task as Medium priority.Nov 4 2024, 4:16 PM
Volans subscribed.

@bking we had a brief chat in the I/F meeting today about this. We think that this would mostly be a step backward instead of forward.

  • TFTP reimages are much slower, we did move to HTTP for performance reasons
  • the UEFI support is almost ready and that will also solve the problem or anyway put us in a position to be able to push for vendor support.

I personally add that potentially the cookbook could automatically detect if there is a 10G NIC and apply the flag without having to do it manually.

For additional context, those are a few of the past cases that have been hit by this issue (it might help prioritize the work): T312298, T308106, T374924, T350179, T286722

bking claimed this task.

As @elukey and @Volans are addressing the issue in T363576 , I'm going to go ahead and close this one out. Thanks to you both for your help!