Page MenuHomePhabricator

ifup@eno1.service failed on some buster hosts
Closed, ResolvedPublic

Description

After being upgraded to Buster, some hosts including mw1265, mc1033, mc2033 (but I think not all Buster hosts) have started firing systemd service alerts due to the ifup@eno1.service failing. Journalctl says:

rzl@mw1265:~$ sudo journalctl -u ifup@eno1.service 
-- Logs begin at Tue 2020-12-15 17:26:31 UTC, end at Tue 2020-12-15 23:13:11 UTC
Dec 15 17:26:34 mw1265 systemd[1]: Started ifup for eno1.
Dec 15 17:26:34 mw1265 sh[882]: Cannot find device "eno1/64"
Dec 15 17:26:34 mw1265 sh[882]: ifup: failed to bring up eno1
Dec 15 17:26:34 mw1265 systemd[1]: ifup@eno1.service: Main process exited, code=exited, status=1/FAILURE
Dec 15 17:26:34 mw1265 systemd[1]: ifup@eno1.service: Failed with result 'exit-code'.

Not sure if that's an artifact of the reimage, a config incompatibility with buster, or something else. It doesn't seem to be harming anything, apart from causing the systemd icinga alert to fire, so instead of trying to just restart it and clear the alert, I've left it as-is for investigation.

Event Timeline

There is a very interesting diff between mc1032 (not showing the issue) and mc1033 (showing the issue):

elukey@mc1032:~$ cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
allow-hotplug eno1
iface eno1 inet static
	address 10.64.32.212/22
	gateway 10.64.32.1
	# dns-* options are implemented by the resolvconf package, if installed
	dns-nameservers 10.3.0.1
	dns-search eqiad.wmnet
   pre-up /sbin/ip token set ::10:64:32:212 dev eno1
   up ip addr add 2620:0:861:103:10:64:32:212/64 dev eno1
   up /usr/local/sbin/interface-rps eno1

elukey@mc1033:~$ cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
allow-hotplug eno1
iface eno1 inet static
	address 10.64.48.155/22
	gateway 10.64.48.1
	# dns-* options are implemented by the resolvconf package, if installed
	dns-nameservers 10.3.0.1
	dns-search eqiad.wmnet
	pre-up /sbin/ip token set ::10:64:48:155 dev eno1
	up ip addr add 2620:0:861:107:10:64:48:155 dev eno1/64
   up ip addr add 2620:0:861:107:10:64:48:155/64 dev eno1
   up /usr/local/sbin/interface-rps eno

The network ifup scripts are different, but I checked with lspci and they are running the same nic BCM5719.

Edit: I suspect that something has happened in late_command.sh at the end of d-i, there is some code that could explain the eno1/64, that I don't find on other nodes..

Joe added a project: SRE.
Joe added a subscriber: Joe.

Per @elukey's comment, assigning to John.

Change 649831 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] late_command: Fix cidr prefix

https://gerrit.wikimedia.org/r/649831

Change 649831 merged by Jbond:
[operations/puppet@production] late_command: Fix cidr prefix

https://gerrit.wikimedia.org/r/649831

Change 649835 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] late_command: use correct slash

https://gerrit.wikimedia.org/r/649835

Change 649835 merged by Jbond:
[operations/puppet@production] late_command: use correct slash

https://gerrit.wikimedia.org/r/649835

Noticed the following servers with this issue which i have manually fixed

[x] mw1265.eqiad.wmnet                                                                                                                                                                                                                                                       
[x] kafka-test1010.eqiad.wmnet                                                                                                                                                                                                                                               
[x] kafka-test1008.eqiad.wmnet                                                                                                                                                                                                                                               
[x] kafka-test1009.eqiad.wmnet                                                                                                                                                                                                                                          
[x] sretest1002.eqiad.wmnet                                                                                                                                                                                                                                                    
[x] mc2031.codfw.wmnet                                                                                                                                                                                                                                                       
[x] mc2033.codfw.wmnet                                                                                                                                                                                                                                                      
[x] mc1031.eqiad.wmnet                                                                                                                                                                                                                                                      
[x] mc1033.eqiad.wmnet

My first fix to late command had a typo just testing another PS on sretest1001

for the record the following line was introduced by error

up ip addr add 2620:0:861:107:10:64:48:155 dev eno1/64

it should be the same as what puppet adds i.e.

up ip addr add 2620:0:861:107:10:64:48:155/64 dev eno1

I have applied a fix and tested this on sertest1001 and all looks good but will leave this task open for further confirmation

Oh wow, I filed this and went to bed, love to wake up and see it fully handled. :) Thanks all!