While debugging an unrelated issue with the disk utilization, we noticed this in the dmesg output:
[Tue Apr 29 14:05:38 2025] bnxt_en 0000:4b:00.0 eno12399np0: NIC Link is Down [Tue Apr 29 14:05:39 2025] Process accounting resumed [Tue Apr 29 14:05:39 2025] ipip: IPv4 and MPLS over IPv4 tunneling driver [Tue Apr 29 14:05:41 2025] bnxt_en 0000:4b:00.0 eno12399np0: NIC Link is Up, 25000 Mbps (NRZ) full duplex, Flow control: none [Tue Apr 29 14:05:41 2025] bnxt_en 0000:4b:00.0 eno12399np0: FEC autoneg off encoding: Clause 74 BaseR
This is further confirmed by the drop in traffic (https://grafana.wikimedia.org/goto/OWs50UbNR?orgId=1) and the getsel output for lvs3009:
Record: 33 Date/Time: 04/29/2025 14:01:43 Source: system Severity: Critical Description: A fatal error was detected on a component at bus 4 device 0 function 0. ------------------------------------------------------------------------------- Record: 34
sukhe@lvs3009:~$ sudo lspci -s 04:00.0 04:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe