We realized that during maintenance reboots on cp* hosts in esams, the haproxy unit failed to start correctly because the network was still unavailable to perform some preliminary checks.
After some investigation we discovered that link auto negotiation was taking about 4s at boot:
Some relevant logs on one new host (cp3066):
Aug 21 09:21:03 cp3066 kernel: [ 22.218068] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20) Aug 21 09:21:03 cp3066 kernel: [ 22.252181] ipmi_si IPI0001:00: IPMI kcs interface initialized Aug 21 09:21:03 cp3066 kernel: [ 22.258348] ipmi_ssif: IPMI SSIF Interface driver Aug 21 09:21:03 cp3066 kernel: [ 22.861461] bnxt_en 0000:4b:00.0 eno12399np0: NIC Link is Up, 25000 Mbps full duplex, Flow control: ON - receive Aug 21 09:21:03 cp3066 kernel: [ 22.861465] bnxt_en 0000:4b:00.0 eno12399np0: FEC autoneg off encoding: Clause 74 BaseR Aug 21 09:21:03 cp3066 kernel: [ 22.937982] Process accounting resumed Aug 21 09:21:03 cp3066 kernel: [ 22.967030] bnxt_en 0000:4b:00.0 eno12399np0: NIC Link is Down Aug 21 09:21:06 cp3066 kernel: [ 26.478237] bnxt_en 0000:4b:00.0 eno12399np0: NIC Link is Up, 25000 Mbps full duplex, Flow control: none Aug 21 09:21:06 cp3066 kernel: [ 26.478240] bnxt_en 0000:4b:00.0 eno12399np0: FEC autoneg off encoding: Clause 74 BaseR Aug 21 09:21:11 cp3066 kernel: [ 31.735821] TCP: request_sock_TCP: Possible SYN flooding on port 5666. Sending cookies. Check SNMP counters.
The output of lldpcli show neigh shows
SysDescr: Juniper Networks, Inc. qfx5120-48y-8c Ethernet Switch, kernel JUNOS 22.2R3.15, Build date: 2023-03-22 15:40:54 UTC Copyright (c) 1996-2023 Juniper Networks, Inc.
The output of ethtool -i <iface>
driver: bnxt_en version: 5.10.0-25-amd64 firmware-version: 218.0.219.13/pkg 21.85.21.92
Compared with the log on another (cp6001) host:
[Mon Jun 12 14:47:30 2023] bnxt_en 0000:3b:00.0 enp59s0f0np0: renamed from eth0 [Mon Jun 12 14:47:40 2023] bnxt_en 0000:3b:00.0 enp59s0f0np0: NIC Link is Up, 25000 Mbps full duplex, Flow control: ON - receive & transmit [Mon Jun 12 14:47:40 2023] bnxt_en 0000:3b:00.0 enp59s0f0np0: FEC autoneg off encoding: Clause 74 BaseR [Mon Jun 12 14:47:40 2023] bnxt_en 0000:3b:00.0 enp59s0f0np0: NIC Link is Up, 25000 Mbps full duplex, Flow control: none [Mon Jun 12 14:47:40 2023] bnxt_en 0000:3b:00.0 enp59s0f0np0: FEC autoneg off encoding: Clause 74 BaseR [Tue Jun 13 09:41:03 2023] bnxt_en 0000:3b:00.0 enp59s0f0np0: NIC Link is Down [Tue Jun 13 09:41:04 2023] bnxt_en 0000:3b:00.0 enp59s0f0np0: NIC Link is Up, 25000 Mbps full duplex, Flow control: none [Tue Jun 13 09:41:04 2023] bnxt_en 0000:3b:00.0 enp59s0f0np0: FEC autoneg off encoding: Clause 74 BaseR [Wed Aug 2 13:42:45 2023] device enp59s0f0np0 entered promiscuous mode
In this case the same operation takes ~1s
SysDescr: Juniper Networks, Inc. qfx5120-48y-8c Ethernet Switch, kernel JUNOS 20.4R3.8, Build date: 2021-09-07 17:29:30 UTC Copyright (c) 1996-2021 Juniper Networks, Inc.
(note that the JUNOS version is different)
driver: bnxt_en version: 5.10.0-23-amd64 firmware-version: 218.0.169.2/pkg 21.80.16.95
Note that the firmware version here is different too, but the NIC model is the same on both servers: BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
Can this be related to different network devices and/or configuration?
Thanks