Page MenuHomePhabricator

interface renaming via systemd .link file can race with sysctl parameters
Closed, ResolvedPublic

Description

This ticket describes an issue I'm observing.

The context is the following:

  • reinstalling a server from Debian 10 Buster to Debian 11 Bullseye T304598: cloudgw: upgrade servers to Debian 11 Bullseye
  • interface gets renamed from ens2f1np1 to enp101s0f0np0 with the reinstall
  • the new interface name, when added the vlan tag, can result in an invalid interface name enp101s0f0np0.1107 because > IFNAMSIZ(16)

The associated puppet manifests hardcode a lot of {interface}.{vlan}, so switching to vlanXXXX naming scheme for the interface is discarded, trying to avoid a rewrite in the middle of a host reinstall.
A solution explored is to simply add a systemd .link file to rename the interface to something shorter. A file like this is introduced:

/etc/systemd/network/10-persistent-net.link
# managed by puppet
[Match]
MACAddress=bc:97:e1:e2:52:51
[Link]
Name=dataplane

This interface is expected to forward TCP/IP traffic (cloudgw is a layer3 router), so we need to activate sysctl, something like:

    sysctl::parameters { 'cloudgw':
        values   => {
            # Enable IP forwarding, only on dataplane
            "net.ipv4.conf.${nic_dataplane}.forwarding"              => 1,
            "net.ipv4.conf.${nic_dataplane}/${virt_vlan}.forwarding" => 1,
            "net.ipv4.conf.${nic_dataplane}/${wan_vlan}.forwarding"  => 1,
[...]

Turns out that we can't do this sysctl operation because systemd-udev.service isn't run fast enough. When systemd-sysctl tries to load that at boot time, the sysfs directories don't exists, yet:

aborrero@cloudgw1001:~ $ sudo systemctl status systemd-sysctl
● systemd-sysctl.service - Apply Kernel Variables
     Loaded: loaded (/lib/systemd/system/systemd-sysctl.service; static)
     Active: active (exited) since Tue 2022-04-05 12:16:43 UTC; 10min ago
       Docs: man:systemd-sysctl.service(8)
             man:sysctl.d(5)
    Process: 434 ExecStart=/lib/systemd/systemd-sysctl (code=exited, status=0/SUCCESS)
   Main PID: 434 (code=exited, status=0/SUCCESS)
        CPU: 15ms

Apr 05 12:16:43 cloudgw1001 systemd[1]: Starting Apply Kernel Variables...
Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '1' to 'net/ipv4/conf/dataplane.1107/forwarding', ignoring: No such file or directory
Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '0' to 'net/ipv4/conf/dataplane.1107/rp_filter', ignoring: No such file or directory
Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '1' to 'net/ipv4/conf/dataplane.1120/forwarding', ignoring: No such file or directory
Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '0' to 'net/ipv4/conf/dataplane.1120/rp_filter', ignoring: No such file or directory
Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '0' to 'net/ipv6/conf/dataplane.1107/accept_ra', ignoring: No such file or directory
Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '1' to 'net/ipv6/conf/dataplane.1107/forwarding', ignoring: No such file or directory
Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '0' to 'net/ipv6/conf/dataplane.1120/accept_ra', ignoring: No such file or directory
Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '1' to 'net/ipv6/conf/dataplane.1120/forwarding', ignoring: No such file or directory
Apr 05 12:16:43 cloudgw1001 systemd[1]: Finished Apply Kernel Variables.

Similar issues have been reported upstream already for example https://github.com/systemd/systemd/issues/7293 which is apparently solved. I may file a separate ticket upstream.

I tried ordering systemd-sysctl.service after networking.service with things like After=networking.service but there are dependency loops that systemd doesn't like and that results in the sysctl params not being loaded at all.

As a somewhat stable solution, I will migrate all the NIC-related sysctl params to a post-up script for ifupdown.

Event Timeline

Change 777418 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudgw: relocate dataplane-specific sysctl params to ifupdown

https://gerrit.wikimedia.org/r/777418

Change 777418 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudgw: relocate dataplane-specific sysctl params to ifupdown

https://gerrit.wikimedia.org/r/777418

Change 777763 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudgw1002: rename interface names

https://gerrit.wikimedia.org/r/777763

Change 777763 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudgw1002: rename interface names

https://gerrit.wikimedia.org/r/777763