Page MenuHomePhabricator

tagged_interface sometimes exceeds IFNAMSIZ
Closed, ResolvedPublic

Description

On some systems like lvs2010 using Predictable Network interface names tagged_interface creates invalid network interfaces names like enp59s0f1d1.2017 cause the length exceeds IFNAMSIZ (16).

root@lvs2010:~# ifup -a
Error: argument "enp59s0f1d1.2017" is wrong: "name" too long

ifup: ignoring unknown interface enp59s0f1d1.2017=enp59s0f1d1.2017
Error: argument "enp175s0f1d1.2019" is wrong: "name" too long

ifup: ignoring unknown interface enp175s0f1d1.2019=enp175s0f1d1.2019
Error: argument "enp59s0f1d1.2001" is wrong: "name" too long

ifup: ignoring unknown interface enp59s0f1d1.2001=enp59s0f1d1.2001
Error: argument "enp175s0f1d1.2003" is wrong: "name" too long

ifup: ignoring unknown interface enp175s0f1d1.2003=enp175s0f1d1.2003

Event Timeline

Change 474272 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] lvs: Avoid tagged network interfaces to hit IFNAMSIZ (15+\0) limit

https://gerrit.wikimedia.org/r/474272

ema triaged this task as Medium priority.Nov 19 2018, 12:44 PM
ema moved this task from Backlog to LoadBalancer on the Traffic board.

I think this is addressed by systemd's 9009d3b5c3b6d191be69215736be77583e0f23f9, included in v239 (stretch has v232, buster has v241).

On e.g. lvs2010, the interfaces' sysfs paths are:

  • /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/net/enp59s0f0
  • /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.1/net/enp59s0f1d1
  • /sys/devices/pci0000:ae/0000:ae:00.0/0000:af:00.0/net/enp175s0f0
  • /sys/devices/pci0000:ae/0000:ae:00.0/0000:af:00.1/net/enp175s0f1d1

(ls -1d /sys/devices/pci0000:*/*/*/net/enp*)

Looking at /sys/bus/pci/slots we can see that 0000:3b:00 and 0000:af:00 are not there, but their respective parents, 0000:3a:00 and 0000:ae:00 are, as slots 2 and 3 respectively.
(for slot in /sys/bus/pci/slots/*; do if [ $(cat $slot/address) = "0000:3a:00" ]; then echo $slot; fi; done).

So with newer systemd I think there are good chances enp59s0f0 will be named ens2f0 and enp175s0f0 ens3f1, which is both shorter (working around the IFNAMSIZE limitation) and more representative of how the machine is configured I think. A reimage to buster would do it; stretch-backports also has v241, but has been poorly maintained as backport in the past (e.g. lagging behind security fixes). Worth a try regardless I think.

that's actually pretty easy to test in lvs2010 (currently a spare system):

vgutierrez@lvs2010:~$ apt-cache policy systemd
systemd:
  Installed: 232-25+deb9u9
  Candidate: 232-25+deb9u11
  Version table:
     241-1~bpo9+1 100
        100 http://mirrors.wikimedia.org/debian stretch-backports/main amd64 Packages
     232-25+deb9u11 500
        500 http://security.debian.org/debian-security stretch/updates/main amd64 Packages
 *** 232-25+deb9u9 100
        100 /var/lib/dpkg/status
     232-25+deb9u8 500
        500 http://mirrors.wikimedia.org/debian stretch/main amd64 Packages

so systemd 241 shows the same behaviour as 232 in lvs2010:

vgutierrez@lvs2010:~$ apt-cache policy systemd
systemd:
  Installed: 241-1~bpo9+1
  Candidate: 241-1~bpo9+1
  Version table:
 *** 241-1~bpo9+1 100
        100 http://mirrors.wikimedia.org/debian stretch-backports/main amd64 Packages
        100 /var/lib/dpkg/status
     232-25+deb9u11 500
        500 http://security.debian.org/debian-security stretch/updates/main amd64 Packages
     232-25+deb9u8 500
        500 http://mirrors.wikimedia.org/debian stretch/main amd64 Packages
vgutierrez@lvs2010:~$ sudo dmesg |grep rename
[    5.107385] bnxt_en 0000:3b:00.0 enp59s0f0: renamed from eth0
[    5.297922] bnxt_en 0000:af:00.0 enp175s0f0: renamed from eth2
[    5.320634] bnxt_en 0000:3b:00.1 enp59s0f1d1: renamed from eth1
[    5.404452] bnxt_en 0000:af:00.1 enp175s0f1d1: renamed from eth3

Change 474272 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] lvs: Avoid tagged network interfaces to hit IFNAMSIZ (15+\0) limit

https://gerrit.wikimedia.org/r/474272

so this is currently a blocker on cloudvirt1024.eqiad.wmnet for @Andrew. The suggested approach by @faidon of using systemd >= 239 doesn't seem to work. I've rebased https://gerrit.wikimedia.org/r/474272 and ran pcc against our whole fleet of lvs and cloudvirt1024, it shows the expected changes:

  • lvs: NOOP
  • cloudvirt1024: trimmed tagged network interface names: enp175s0f1d1.1105 -> p175s0f1d1.1105

pcc output can be checked in https://puppet-compiler.wmflabs.org/compiler1002/15648/

the shortened interface name Exec[/sbin/ifup p175s0f1d1.1105] looks even more confusing, seems to be some random string. Perhaps it makes sense to selectively disable the predictable names in some cases like these.

So we are effectively stripping the common part of every ethernet interface name: en. We don't lose a bit of information. I don't see the problem to be honest.

It's not ideal, but the part that was stripped was the most-predictable part of the name (the en prefix), so it's not all that confusing.

I think the best way out of this rabbithole in general would be to stop prefixing base interface names onto vlan-tagged interface names and just call it "vlan1105" or whatever. The ip tool outputs will still print it as vlan1105@enp175s0f1d1 if you need to understand the physical mapping. The key problem with making that change in puppet is that it will rename all the LVS vlan interfaces (and all the existing ones for WMCS as well), so it will be complex to get over the deployment hurdle.

FWIW the ganeti cluster uses exactly the approach outlined by @BBlack for this (among other, even more important) reasons:

ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master private state UP mode DEFAULT group default qlen 1000
    link/ether f0:1f:af:e8:c5:a3 brd ff:ff:ff:ff:ff:ff
3: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether f0:1f:af:e8:c5:a4 brd ff:ff:ff:ff:ff:ff
4: private: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether f0:1f:af:e8:c5:a3 brd ff:ff:ff:ff:ff:ff
5: eno1.1003@eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master public state UP mode DEFAULT group default qlen 1000
    link/ether f0:1f:af:e8:c5:a3 brd ff:ff:ff:ff:ff:ff
6: public: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether f0:1f:af:e8:c5:a3 brd ff:ff:ff:ff:ff:ff
7: eno1.1022@eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master analytics state UP mode DEFAULT group default qlen 1000
    link/ether f0:1f:af:e8:c5:a3 brd ff:ff:ff:ff:ff:ff
8: analytics: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether f0:1f:af:e8:c5:a3 brd ff:ff:ff:ff:ff:ff

private/analytics/public are the interfaces names used in the software and are all bridges with eno1.1003, eno1.10022 and eno1 being the underlying vlan tagged/untagged interfaces.

Unfortunately it does need some more puppetization work.

Mentioned in SAL (#wikimedia-operations) [2019-04-09T11:00:00Z] <ema> rebooting lvs2010 with systemd 241-1~bpo9+1 T209707

So with newer systemd I think there are good chances enp59s0f0 will be named ens2f0 and enp175s0f0 ens3f1

You're right. All systemd-related packages upgraded:

ii  systemd        241-1~bpo9+1 amd64        system and service manager
ii  systemd-sysv   241-1~bpo9+1 amd64        system and service manager - SysV links
ii  libsystemd0:amd64 241-1~bpo9+1 amd64        systemd utility library
ii  udev           241-1~bpo9+1 amd64        /dev/ and hotplug management daemon
ii  libudev1:amd64 241-1~bpo9+1 amd64        libudev shared library

Host rebooted, I see the following interfaces:

root@lvs2010:~# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens2f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:0a:f7:f0:02:40 brd ff:ff:ff:ff:ff:ff
3: ens2f1d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:0a:f7:f0:02:41 brd ff:ff:ff:ff:ff:ff
4: ens3f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:0a:f7:f0:0c:10 brd ff:ff:ff:ff:ff:ff
5: ens3f1d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:0a:f7:f0:0c:11 brd ff:ff:ff:ff:ff:ff

Mentioned in SAL (#wikimedia-operations) [2019-04-09T14:10:50Z] <ema> reboot lvs2010 with systemd 232 T209707

We could also look into a backport of https://github.com/systemd/systemd/commit/9009d3b5c3b6d191be69215736be77583e0f23f9 to Stretch, seems totally doable and once confirmed to work fine in our environment, submit it as a merge request for the Debian systemd maintainers for a Stretch point release (every point release ships backports of important bugfixes, e.g. https://tracker.debian.org/news/1037358/accepted-systemd-232-25deb9u10-source-into-proposed-updates-stable-new-proposed-updates/

Installing cloudvirt1024 with Buster isn't really an option -- we'd have to port all OpenStack packages for versions M and N to Buster just to keep this one server alive -- we won't otherwise be running M or N on Buster at all, we'll have otherwise upgraded to O or P before we update the rest of our cluster to Buster. That's a lot of ported packages to manage (dozens, possibly as many as 100) just to work around this one issue.

@aborrero can chime in with what would actually be involved but I expect it's a lot.

Installing cloudvirt1024 with Buster isn't really an option -- we'd have to port all OpenStack packages for versions M and N to Buster just to keep this one server alive -- we won't otherwise be running M or N on Buster at all, we'll have otherwise upgraded to O or P before we update the rest of our cluster to Buster. That's a lot of ported packages to manage (dozens, possibly as many as 100) just to work around this one issue.

@aborrero can chime in with what would actually be involved but I expect it's a lot.

Openstack Mitaka is only supported in jessie (in jessie-backports). The jessie-backports repo no longer exists, we had to rescue/reallocate all the .deb packages to our internal repo. Moreover, jessie-backports is a rebuild of packages from stretch, but openstack Mitaka is no longer in stretch (it was upgraded to newton).
Here you can find some info on the openstack version/debian version matrix: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Openstack_source
We don't support the mitaka/buster combo. Adding support for that would mean months of FTE work. We would rather spend our time upgrading our cloud fleet to mitaka/stretch, so we can later move to newton/stretch "easily", so we can later move to ocata/stretch "easily", so we can later move to ocata/buster.

The most short term/easy options I see are:

  • merge that patch proposal by @Vgutierrez even if the resulting NIC name is not optimal
  • disable predictable interface names in concrete servers, and go back to eth1
  • backport the systemd patch

I volunteer to add puppet code to disable predictable names in case that's the approach we choose.

*bump*

I still need something like https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/474272/ in order to get cloudvirt1024 online (and to pave the way towards upgrading similar hardware to Stretch). Are there competing solutions at this point, or should I just be bold and merge 474272?

Change 474272 merged by Andrew Bogott:
[operations/puppet@production] lvs: Avoid tagged network interfaces to hit IFNAMSIZ (15+\0) limit

https://gerrit.wikimedia.org/r/474272

After the trimmed interface name, we had to generate a /etc/network/interface file like this by hand for the config to survive a reboot:

auto p175s0f1d1.1105
iface p175s0f1d1.1105 inet manual
   pre-up ip link add link enp175s0f1d1 name $IFACE type vlan id 1105 || true
   pre-up ip link set enp175s0f1d1 up
   up ip link set $IFACE up
   down ip link set $IFACE down

it seems ifupdown is very clumsy if the tagged interface name doesn't match the base interface name.

This should be added to puppet.

As discussed on IRC, using vlan-raw-device enp175s0f1d1 should be enough, as recommended in https://wiki.debian.org/NetworkConfiguration#Manual_config

As discussed on IRC, using vlan-raw-device enp175s0f1d1 should be enough, as recommended in https://wiki.debian.org/NetworkConfiguration#Manual_config

Is not enough :-/ as I mentioned ifupdown turns clumsy.

It will try to create a vlan tagged interface like this:

5: enp175s0f1d1.11@enp175s0f1d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000

which is trimming in the wrong part of the string.

so taking a deeper look into https://manpages.debian.org/jessie/vlan/vlan-interfaces.5.en.html:

vlan-raw-device devicename
Indicates the device to create the vlan on. This is ignored when the devicename is part of the vlan interface name.

So if you choose p175s0f1d1.1105 as the interface name, I'm guessing that it's ignoring the vlan-raw-device enp175s0f1d1. Following the documentation, we should use vlan1105 as the interface name. Otherwise we're getting into non-standard vlan interface naming.

Change 508770 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] lvs: Toggle VLAN legacy naming

https://gerrit.wikimedia.org/r/508770

Change 508796 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] openstack: Disable legacy vlan naming for cloudvirt1024

https://gerrit.wikimedia.org/r/508796

Mentioned in SAL (#wikimedia-operations) [2019-05-13T10:17:38Z] <vgutierrez> rebooting cloudvirt1024 - T209707

Change 508770 merged by Vgutierrez:
[operations/puppet@production] lvs: Toggle VLAN legacy naming

https://gerrit.wikimedia.org/r/508770

Change 508796 merged by Andrew Bogott:
[operations/puppet@production] openstack: Disable legacy vlan naming for cloudvirt1024

https://gerrit.wikimedia.org/r/508796

Vgutierrez claimed this task.
Vgutierrez removed a project: Patch-For-Review.