Page MenuHomePhabricator

cp3003 network interface issues
Closed, DeclinedPublic

Description

I've upgraded cp3003's kernel to 4.9 as part of T162029 but the host unfortunately didn't come back up online properly because of network interface issues. This problem seems to somehow be similar to T148891 although in that case we got a different error message: bnx2x 0000:01:00.0 eth0: NIC Link is Down.

Apr  4 11:20:54 cp3003 kernel: [ 1265.188181] bnx2x 0000:01:00.0 eth0: using MSI-X  IRQs: sp 52  fp[0] 54 ... fp[11] 65
Apr  4 11:20:55 cp3003 kernel: [ 1265.357676] bnx2x 0000:01:00.0 eth0: Warning: Unqualified SFP+ module detected, Port 0 from LEONI            part number L45593-C100-D20
Apr  4 11:20:55 cp3003 kernel: [ 1265.416246] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
root@cp3003:~# ip link show dev eth0
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq portid d4ae528c8b7e state DOWN 0
    link/ether d4:ae:52:8c:8b:7e brd ff:ff:ff:ff:ff:ff
root@cp3003:~# ethtool -t eth0 offline
[ 1007.699834] bnx2x 0000:01:00.0 eth0: Warning: Unqualified SFP+ module detected, Port 0 from 
The test result is FAIL
The test extra info:
register_test (offline)    	 0
memory_test (offline)      	 0
int_loopback_test (offline)	 0
ext_loopback_test (offline)	 0
nvram_test (online)        	 0
interrupt_test (online)    	 0
link_test (online)         	 1

Rebooting the machine into the older 4.4.2-3+wmf8 kernel didn't fix the issue, suggesting that this might well be a hardware problem showing up when the card gets re-initialised.

Event Timeline

ema created this task.Apr 4 2017, 11:50 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 4 2017, 11:50 AM
ema triaged this task as Medium priority.Apr 4 2017, 11:50 AM
ema added a comment.Apr 4 2017, 12:16 PM

I've tried a "cold reboot" with racadm serveraction powerdown ; racadm serveraction powerup to no avail.

ema moved this task from Triage to Caching on the Traffic board.Apr 6 2017, 10:13 AM
faidon added a subscriber: faidon.May 29 2017, 11:44 AM

Note that due to various other changes in the infrastructure since this went offline, we'll need to reinstall the system when (if?) it comes back up online.

BBlack closed this task as Declined.Jun 8 2017, 3:58 AM
BBlack added a subscriber: BBlack.

cp3003 is decomming for good in T167376