This ticket describes an issue I'm observing.
The context is the following:
- reinstalling a server from Debian 10 Buster to Debian 11 Bullseye T304598: cloudgw: upgrade servers to Debian 11 Bullseye
- interface gets renamed from ens2f1np1 to enp101s0f0np0 with the reinstall
- the new interface name, when added the vlan tag, can result in an invalid interface name enp101s0f0np0.1107 because > IFNAMSIZ(16)
The associated puppet manifests hardcode a lot of {interface}.{vlan}, so switching to vlanXXXX naming scheme for the interface is discarded, trying to avoid a rewrite in the middle of a host reinstall.
A solution explored is to simply add a systemd .link file to rename the interface to something shorter. A file like this is introduced:
# managed by puppet [Match] MACAddress=bc:97:e1:e2:52:51 [Link] Name=dataplane
This interface is expected to forward TCP/IP traffic (cloudgw is a layer3 router), so we need to activate sysctl, something like:
sysctl::parameters { 'cloudgw': values => { # Enable IP forwarding, only on dataplane "net.ipv4.conf.${nic_dataplane}.forwarding" => 1, "net.ipv4.conf.${nic_dataplane}/${virt_vlan}.forwarding" => 1, "net.ipv4.conf.${nic_dataplane}/${wan_vlan}.forwarding" => 1, [...]
Turns out that we can't do this sysctl operation because systemd-udev.service isn't run fast enough. When systemd-sysctl tries to load that at boot time, the sysfs directories don't exists, yet:
aborrero@cloudgw1001:~ $ sudo systemctl status systemd-sysctl ● systemd-sysctl.service - Apply Kernel Variables Loaded: loaded (/lib/systemd/system/systemd-sysctl.service; static) Active: active (exited) since Tue 2022-04-05 12:16:43 UTC; 10min ago Docs: man:systemd-sysctl.service(8) man:sysctl.d(5) Process: 434 ExecStart=/lib/systemd/systemd-sysctl (code=exited, status=0/SUCCESS) Main PID: 434 (code=exited, status=0/SUCCESS) CPU: 15ms Apr 05 12:16:43 cloudgw1001 systemd[1]: Starting Apply Kernel Variables... Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '1' to 'net/ipv4/conf/dataplane.1107/forwarding', ignoring: No such file or directory Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '0' to 'net/ipv4/conf/dataplane.1107/rp_filter', ignoring: No such file or directory Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '1' to 'net/ipv4/conf/dataplane.1120/forwarding', ignoring: No such file or directory Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '0' to 'net/ipv4/conf/dataplane.1120/rp_filter', ignoring: No such file or directory Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '0' to 'net/ipv6/conf/dataplane.1107/accept_ra', ignoring: No such file or directory Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '1' to 'net/ipv6/conf/dataplane.1107/forwarding', ignoring: No such file or directory Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '0' to 'net/ipv6/conf/dataplane.1120/accept_ra', ignoring: No such file or directory Apr 05 12:16:43 cloudgw1001 systemd-sysctl[434]: Couldn't write '1' to 'net/ipv6/conf/dataplane.1120/forwarding', ignoring: No such file or directory Apr 05 12:16:43 cloudgw1001 systemd[1]: Finished Apply Kernel Variables.
Similar issues have been reported upstream already for example https://github.com/systemd/systemd/issues/7293 which is apparently solved. I may file a separate ticket upstream.
I tried ordering systemd-sysctl.service after networking.service with things like After=networking.service but there are dependency loops that systemd doesn't like and that results in the sysctl params not being loaded at all.
As a somewhat stable solution, I will migrate all the NIC-related sysctl params to a post-up script for ifupdown.