Follow up from the long investigation in T303776#7781198 and T303776#7797564.
We finally found the root cause and @Papaul identified the faulty firmware version:
Downgrade NIC firmware on cloudvirt1025 and cloudvirt1026 from 22.00.07.60 to 21.60.22.11 fixed the Failed to load ldlinux.c32 issue
So I'm not a PXE expert, so I don't know if the issue can be solved by updating the lpxelinux.0 binary, but at this point we should:
- Follow up with the vendor so they can provider a fixed firmware (or guidance on how to workaround the issue)
- Check if there is any server running the faulty version and downgrade the firmware (or at least warn the service owner)
- From https://puppetboard.wikimedia.org/fact/net_driver no signs of 22.00.07.60
- List all the servers using the same NIC so we make sure to not upgrade them (and see the scale of the potential issue)