Page MenuHomePhabricator

new fundraising Buster servers - bonded ethernet network error/warning
Closed, DeclinedPublic

Description

We're seeing this after configuring active-backup ethernet bonding on all the new Dell/Buster hosts at codfw:

"bond0: invalid new link 3 on slave eno2"

It does not seem to intefere with connectivity or failover.

Event Timeline

ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
tg3.c:v3.137 (May 11, 2014)
tg3 0000:04:00.0 eth0: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 4c:d9:8f:a9:dd:dc
tg3 0000:04:00.0 eth0: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
tg3 0000:04:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
tg3 0000:04:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit]
tg3 0000:04:00.1 eth1: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 4c:d9:8f:a9:dd:dd
tg3 0000:04:00.1 eth1: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
tg3 0000:04:00.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
tg3 0000:04:00.1 eth1: dma_rwctrl[00000001] dma_mask[64-bit]
tg3 0000:04:00.1 eno2: renamed from eth1
tg3 0000:04:00.0 eno1: renamed from eth0
bonding: bond0 is being created...
bond0: Enslaving eno1 as a backup interface with a down link
bond0: Enslaving eno2 as a backup interface with a down link
tg3 0000:04:00.0 eno1: Link is up at 1000 Mbps, full duplex
tg3 0000:04:00.0 eno1: Flow control is on for TX and on for RX
tg3 0000:04:00.0 eno1: EEE is disabled
bond0: link status up for interface eno1, enabling it in 0 ms
bond0: link status definitely up for interface eno1, 1000 Mbps full duplex
bond0: making interface eno1 the new active one
bond0: first active interface up!
tg3 0000:04:00.1 eno2: Link is up at 1000 Mbps, full duplex
tg3 0000:04:00.1 eno2: Flow control is on for TX and on for RX
tg3 0000:04:00.1 eno2: EEE is disabled
bond0: link status up for interface eno2, enabling it in 200 ms
bond0: invalid new link 3 on slave eno2
bond0: link status definitely up for interface eno2, 1000 Mbps full duplex

Comparing to payments2002, the noteable difference is that payments2002 does not seem to attempt to bring up eno2 at all, stopping at "bond0: first active interface up!" Why does 2003 behave differently?

Just finished imaging payments2001 and it is exhibiting the same behavior. We tested bond0 failover by unplugging eno1 on payments2003 and it fails over properly. Strange.

Jgreen renamed this task from new payments2003 bonded ethernet network error/warning to new payments2001 and payments2003 bonded ethernet network error/warning.Mar 2 2020, 6:52 PM
Jgreen removed Jgreen as the assignee of this task.Mar 3 2020, 3:46 PM
Jgreen triaged this task as Medium priority.
Jgreen moved this task from Backlog to Up Next on the fundraising-tech-ops board.
Jgreen renamed this task from new payments2001 and payments2003 bonded ethernet network error/warning to new fundraising Buster servers - bonded ethernet network error/warning.Mar 5 2020, 5:34 PM
Jgreen updated the task description. (Show Details)

Chalking this up to weird messaging but otherwise a nonissue. Clearly both interfaces are up, and failover between them is successful.