Page MenuHomePhabricator

hw troubleshooting: Link down for wikikube-worker2140.codfw.wmnet
Closed, ResolvedPublicRequest

Description

  • - Provide FQDN of system.
  • - If other than a hard drive issue, please depool the machine (and confirm that it’s been depooled) for us to work on it. If not, please provide time frame for us to take the machine down.
  • - Put system into a failed state in Netbox.
  • - Provide urgency of request, along with justification (redundancy, dependencies, etc)
  • - Describe issue and/or attach hardware failure log. (Refer to https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook if you need help)
  • - Assign correct project tag and appropriate owner (based on above). Also, please ensure the service owners of the host(s) are added as subscribers to provide any additional input.

FQDN: wikikube-worker2140.codfw.wmnet
Issue: Link down on primary interface
Urgency: Low, not yet imaged for production

root@wikikube-worker2140:~# ethtool eno12409np1 
Settings for eno12409np1:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                10000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  1000baseT/Full
                                10000baseT/Full
        Advertised pause frame use: No
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Speed: Unknown!
	Duplex: Unknown! (255)
	Auto-negotiation: on
	Port: None
	PHYAD: 1
	Transceiver: internal
	Supports Wake-on: g
	Wake-on: d
        Current message level: 0x00002081 (8321)
                               drv tx_err hw
	Link detected: no

Event Timeline

@Papaul you might need to check the switch. I looked in the idrac and the link shows as up. physically up as well.

@Jhancock.wm @Clement_Goubert the interface on the switch side is up

xe-0/0/26       up    up   wikikube-worker2140

i just managed to mount the ip adresses on the other interface eno12399np0 and the link is up. Looks like the wrong one got provisioned?

@Clement_Goubert on your output below you was looking at the second interface (eno12409np1)

root@wikikube-worker2140:~# ethtool eno12409np1 
Settings for eno12409np1:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                10000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  1000baseT/Full
                                10000baseT/Full
        Advertised pause frame use: No
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Speed: Unknown!
	Duplex: Unknown! (255)
	Auto-negotiation: on
	Port: None
	PHYAD: 1
	Transceiver: internal
	Supports Wake-on: g
	Wake-on: d
        Current message level: 0x00002081 (8321)
                               drv tx_err hw
	Link detected: no

Yes, eno12409np1 was the one where the IPs were originally mounted when I encountered the issue. In order to troubleshoot, I changed the config in /etc/network/interfaces to mount the IPs on eno12399np0, and that interface has the link up.

In netbox, the provisioned interface is eno12409np1, so maybe just the wrong interface got plugged in?

glad all is working> I am resolving this task. Thank you

@Papaul sorry for the misunderstanding, but it's not resolved. The interface that is supposed to have the link according to Netbox doesn't. I don't know if the best course of action is to change the connection in Netbox to be to eno12399np0 and reprovision the server?

@Clement_Goubert got you now i will fix it in netbox. Sorry i misunderstood you.

I have the same issue on wikikube-worker2157.codfw.wmnet, the interface in netbox is eno12409np1 but it has no link, whereas eno12399np0 does.

I see you've changed the interface type for wikikube-worker2140 eno12399np0, how should I proceed from there, or do you still have something to finish up on your side?

@Clement_Goubert for 2140 you can re-image nothing else needs to be done on our end. I also update 2157. Let me know if you have any questions.