Page MenuHomePhabricator

scb2005 eth0 interface gets renamed to eth2
Closed, ResolvedPublic

Description

Host scb2005 fails to join the network for the absence of eth0. The only relevant thing that I can see is the following:

root@scb2005:/var/log# grep eth0 syslog
Jun 12 08:23:51 scb2005 kernel: [    4.144232] tg3 0000:02:00.0 eth0: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 18:66:da:7d:ac:b4
Jun 12 08:23:51 scb2005 kernel: [    4.144234] tg3 0000:02:00.0 eth0: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
Jun 12 08:23:51 scb2005 kernel: [    4.144236] tg3 0000:02:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
Jun 12 08:23:51 scb2005 kernel: [    4.144238] tg3 0000:02:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit]
Jun 12 08:23:51 scb2005 kernel: [   16.098322] tg3 0000:02:00.0 eth2: renamed from eth0
Jun 12 08:40:15 scb2005 kernel: [    4.125373] tg3 0000:02:00.0 eth0: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 18:66:da:7d:ac:b4
Jun 12 08:40:15 scb2005 kernel: [    4.125375] tg3 0000:02:00.0 eth0: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
Jun 12 08:40:15 scb2005 kernel: [    4.125377] tg3 0000:02:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
Jun 12 08:40:15 scb2005 kernel: [    4.125378] tg3 0000:02:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit]
Jun 12 08:40:15 scb2005 kernel: [   19.384780] tg3 0000:02:00.0 eth2: renamed from eth0

Various info:

root@scb2005:/var/log# ifconfig
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:96 errors:0 dropped:0 overruns:0 frame:0
          TX packets:96 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:21934 (21.4 KiB)  TX bytes:21934 (21.4 KiB)

lo:LVS    Link encap:Local Loopback
          inet addr:10.2.1.10  Mask:255.255.255.255
          UP LOOPBACK RUNNING  MTU:65536  Metric:1

root@scb2005:/var/log# ifconfig eth0
eth0: error fetching interface information: Device not found
root@scb2005:/var/log# ifconfig eth1
eth1: error fetching interface information: Device not found
root@scb2005:/var/log# ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 18:66:da:7d:ac:b4
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:19

root@scb2005:/var/log# ifconfig eth3
eth3      Link encap:Ethernet  HWaddr 18:66:da:7d:ac:b5
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:16

ifup eth0 doesn't work of course.

Related Objects

Event Timeline

elukey created this task.Jun 12 2017, 8:57 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 12 2017, 8:57 AM
root@scb2005:~# lspci -v | egrep 'Device\ Serial\ Number|Broadcom'
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe
        Capabilities: [13c] Device Serial Number 00-00-18-66-da-7d-ac-b4
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe
        Capabilities: [13c] Device Serial Number 00-00-18-66-da-7d-ac-b5
(1) scb2005.codfw.wmnet
----- OUTPUT of 'uname -a' -----
Linux scb2005 4.4.0-3-amd64 #1 SMP Debian 4.4.2-3+wmf8 (2016-12-22) x86_64 GNU/Linux (got manually from the console)
===== NODE GROUP =====
(1) scb2002.codfw.wmnet
----- OUTPUT of 'uname -a' -----
Linux scb2002 4.4.0-2-amd64 #1 SMP Debian 4.4.2-3+wmf6 (2016-10-18) x86_64 GNU/Linux
===== NODE GROUP =====
(1) scb2004.codfw.wmnet
----- OUTPUT of 'uname -a' -----
Linux scb2004 4.4.0-2-amd64 #1 SMP Debian 4.4.2-3+wmf6 (2016-10-18) x86_64 GNU/Linux
===== NODE GROUP =====
(1) scb2006.codfw.wmnet
----- OUTPUT of 'uname -a' -----
Linux scb2006 4.4.0-3-amd64 #1 SMP Debian 4.4.2-3+wmf8 (2016-12-22) x86_64 GNU/Linux
===== NODE GROUP =====
(1) scb2001.codfw.wmnet
----- OUTPUT of 'uname -a' -----
Linux scb2001 4.4.0-2-amd64 #1 SMP Debian 4.4.2-3+wmf6 (2016-10-18) x86_64 GNU/Linux
===== NODE GROUP =====
(1) scb2003.codfw.wmnet
----- OUTPUT of 'uname -a' -----
Linux scb2003 4.4.0-2-amd64 #1 SMP Debian 4.4.2-3+wmf6 (2016-10-18) x86_64 GNU/Linux

I can see 4 NICs on scb200[46], could it be a hw problem?

The net persistent rules were playing a role here (or looks so).
Removing /etc/udev/rules.d/70-persistent-net.rules and rebooting was able to bring eth0 back (I did a backup and placed it under /home/marostegui/)

However, from the original one, to the current one, there are no longer 4 interfaces, but only two as can be seen:

root@scb2005:~# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 18:66:da:7d:ac:b4
          inet addr:10.192.0.34  Bcast:10.192.3.255  Mask:255.255.252.0
          inet6 addr: 2620:0:860:101:10:192:0:34/64 Scope:Global
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:19

eth1      Link encap:Ethernet  HWaddr 18:66:da:7d:ac:b5
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:16

However, none of those have link:

root@scb2005:~# mii-tool eth0
eth0: no link
root@scb2005:~# mii-tool eth1
SIOCGMIIPHY on 'eth1' failed: Resource temporarily unavailable

@Papaul can you check if there is link on that interface?

@Marostegui no link on eth0 or eth1 . I replaced the network cable same problem. When i plugged the cable on NIC 3 and NIC 4 i have link.
In the ILO under the HW section i can see only NIC 3 and 4 no NIC 1 and 2

Which mac addresses do you see for NIC3 and NIC4?
For me:
eth0: 18:66:da:7d:ac:b4
eth1: 18:66:da:7d:ac:b5

Does any of those correlate with mac addresses you see?
I can try to rename the interfaces to the ones you see so eth0 points to the one with link from your side. But looks like HW issue anyways.

NIC.Embedded.3-1-1 Ethernet = 18:66:DA:7D:AC:B4
NIC.Embedded.4-1-1 Ethernet = 18:66:DA:7D:AC:B5

That does sound like motherboard issues. a quick look in RAC's logs does not show anything though.

Mentioned in SAL (#wikimedia-operations) [2017-06-13T07:12:30Z] <marostegui> Reboot scb2005 - T167638

Marostegui closed this task as Resolved.Jun 13 2017, 7:26 AM
Marostegui claimed this task.

I did a second reboot and the mac addresses cache remained untouched and the server is back up normally.
However, only two interfaces are shown:

root@scb2005:~# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 18:66:da:7d:ac:b4
          inet addr:10.192.0.34  Bcast:10.192.3.255  Mask:255.255.252.0
          inet6 addr: 2620:0:860:101:10:192:0:34/64 Scope:Global
          inet6 addr: fe80::1a66:daff:fe7d:acb4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:5493061 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1794311 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:5099917459 (4.7 GiB)  TX bytes:317997176 (303.2 MiB)
          Interrupt:19

eth1      Link encap:Ethernet  HWaddr 18:66:da:7d:ac:b5
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:16

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:13299 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13299 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:30502186 (29.0 MiB)  TX bytes:30502186 (29.0 MiB)

lo:LVS    Link encap:Local Loopback
          inet addr:10.2.1.10  Mask:255.255.255.255
          UP LOOPBACK RUNNING  MTU:65536  Metric:1

I am closing this because the server is up, but there might be a HW issue underneath with one of the physical NICs.