Creating this task to try and track the current situation we find ourselves in with the 10/25G Broadcom NICs and decide how to move forward.
Much of the previous discussion happened on tasks related to specific hosts which were closed when we found a work-around. For reference some of these tasks include:
{{T286722}}
{{T350179}}
{{T304483}}
Doing a quick audit of our estate I can see we have the following 10G and 25G (SFP+ / SFP28) based NICs:
```
BCM57412 rev 01:
descr: Broadcom Inc. BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
brand: NetXtreme-E
speed: 10Gb
hosts: 818
BCM57414 rev 01:
descr: Broadcom Inc. BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
brand: NetXtreme-E
speed: 10Gb/25Gb
hosts: 174
BCM57810 rev 10:
descr: Broadcom Inc. NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
brand: NetXtreme II
speed: 10Gb
hosts: 34
```
The current status of each, as I understand it its:
**[[ https://www.broadcom.com/products/ethernet-connectivity/network-adapters/p210p | BCM57412 ]]**
This is our most common 10G card. As I understand it we've seen two issues with this:
1] When we reimage the PXEboot process works fine, the system brings the NIC up and does DHCP, downloads the debian image and boots into the installer. But with Bullseye, as described in T286722, when Debian/Linux has loaded the NIC does not bring the port back up, and thus the system cannot get an IP address to download packages and complete the install.
Given this happens within the Debian installer environment, but not at the PXEboot stage, the driver being used/kernel version is one of the factors we need to consider.
2] The PXEboot DHCP step works ok, but the system fails to load Linux from the tftp server, reporting `Failed to load ldlinux.c32`, and the system does not reach the Debain installer environment (T304483).
It's not 100% clear to me in this scenario if the failure is completely within the on-board PXEboot system, or if it's managed to load some stuff over tftp which may play a role.
In both of these cases the solution to this is to make sure the card is using firmware version //21.85.21.92//.
**[[ https://www.broadcom.com/products/ethernet-connectivity/network-adapters/p225p | BCM57414 ]]**
This is the NIC we have for any systems connected at 25G. It has mostly worked ok for us, but we discovered in recent task T350179 that when a 10G / SFP+ module is used in the SFP28 port it will fail to send the DHCP request during PXEboot. The port does come up on the switch side, and (afaik) system says it is trying DHCP, but no DHCP packet is sent to the switch. This problem is further complicated by the fact it's not consistent. It //mostly// occurs, but experience has shown if multiple attempts are made it will generally work 1 out of every 4 or 5 tries.
Given this issue occurs at the PXEboot stage, all software/firmware etc. involved is on-board the system. And thus the problem lies squarely within Dell's remit. It seems to me we should be raising this with Dell and trying to get them to a fix (let them deal with the other vendors). Has any progress been made on that (I couldn't find a task)?
The fix to that problem was discovered to be downgrade firmware to version //21.60.22.11//.
**BCM57810**
This appears to be a different 10G card model, branded //NetXtreme II// rather than //NetXtreme-E//. We only have a [[ https://phabricator.wikimedia.org/P61252 | small number of older hosts ]] with this card.
From the previous tickets it's not clear to me if we've observed issues with this card, or have any specific known 'good' or 'bad' firmware revisions for it. Does anyone have any specific knowledge on this one?
######Next Steps
Things are largely ok I guess, we have known-good firmware versions we can load which overcome the issues for all variants. I mostly wanted to open this task to list out the different cards we have and the issues we've seen, plus the known-good firmware versions.
If we are going to go back to Dell we can use this task to track that. Otherwise, if we are happy using the known-good firmware maybe we can just say that and close it.