Page MenuHomePhabricator

Bug in bridge-utils breaks IPv6 on interface if its not part of a bridge but vlan sub-int of it is
Open, LowPublic

Description

While working on T319184 myself and @aborrero hit an issue which prevented networking from properly starting on WMCS hosts (after moving them to a single NIC handoff.)

The problem was caused because in this case the physical interface is not a member of any bridge, but vlan sub-interfaces of it are. The bug is that a script from bridge-utils gets run when the child sub-interface is processed, disabling IPv6 on the parent:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989162#35

In our case this stopped the primary interface from having any IPs set, as the commands to add the v6 token and address on it failed, causing the whole ifup script to crash.

Arturo has patched this via puppet to overcome the issue for affected WMCS hosts:

https://gerrit.wikimedia.org/r/c/operations/puppet/+/837078

I don't believe we currently have any other hosts with this exact combination of device types, so we should not encounter it elsewhere for now.

@aborrero is in discussion with the maintainers of the bridge-utils package in Debian to upstream these changes to it. If that doesn't happen we could alternatively package our own version with the fix. Or indeed this might just be another reason to look beyond ifupdown to systemd-networkd or alternate network init framework.

Creating this task for visibility and to track the issue.

Event Timeline

cmooney created this task.

The Debian developer wanted to disable autogenerated IPv6 link local addresses on bridged interfaces.

Instead of disabling the whole IPv6 stack, I suggested to use ip link set $NIC addrgenmode none on the affected interfaces, which is how IPv6 Link Local address generation can be disabled. This is a far less aggressive approach, since it would still allow us to add things like tokens and unicast addresses without failures. I didn't invent this solution, I think that is how other software like NetworkManaged handles this kind of situations.

Anyway, this whole topic for me is an indication that the ifupdown stack is perhaps not aging well, and we may need to consider more seriously the move to a different config stack, i.e T234207: Investigate improvements to how puppet manages network interfaces

just noting that ganeti also seems to hit this issue also reported in T233906