Page MenuHomePhabricator

NIC renaming via puppet
Closed, ResolvedPublic

Description

NIC renaming via puppet can be challenging.

Take the following example:

  • the server cloudnet1005.eqiad.wmnet has a predictable interface name too long, like enp175s0f0np0
  • the interface::rename mechanism was introduced in puppet to easily rename an interface, via systemd.link(5).
  • puppet runs once the installer has already generated /etc/network/interfaces
  • moreover, plenty of other configuration (via puppet or perhaps in the installer itself) has been generated referencing the old interface name. Example: sysctl params
  • once puppet runs, the interface is renamed, the server wont boot again with a valid network config

As of this writing we lack a mechanism for either:

  • rename an interface at debian installer time -- so all config, even before the initial puppet run is generated for the new interface name
  • completely manage /etc/network/interfaces via puppet

Event Timeline

out of curiosity why did you decide to create the new interface::rename resource instead of using the current profile::openstack::eqiad1::nova::network_flat_* parameters

out of curiosity why did you decide to create the new interface::rename resource instead of using the current profile::openstack::eqiad1::nova::network_flat_* parameters

But that doesn't rename anything, no? Also, my plan was to reuse the interface::rename resource in different profiles as needed.

I think the base problem is that newer servers get longer interface names that become invalid once you attach the vlan tag to them (because IFNAMSIZ).
So far, since we have been using 2 NIC approach for most of our WMCS servers, we just needed to rename the secondary interface (which was not generated at debian installer time).
Now, with the 1 NIC approach T319184: Move WMCS servers to 1 single NIC, we need to introduce vlan tags to the primary interface, and that's why the rename is needed.

aborrero triaged this task as Medium priority.Oct 5 2022, 9:21 AM

@jbond and me just had a meeting on this.

The last time we had a similar problem was in T209707: tagged_interface sometimes exceeds IFNAMSIZ and in particular for WMCS servers we patched them by switching to a different vlan tagging model: https://gerrit.wikimedia.org/r/c/operations/puppet/+/508796
However, we only did that for a secondary interface, and the solution is basically an interface rename, so we're back to the origin of the problem: how to rename a primary interface once a server has been reimaged.

We see 4 potential solutions to this problem:

  • extend interface::rename to also update /etc/network/interfaces. Imagine something like sed s/oldname/newname/g or a similar hack.
  • explore options to configure the debian installer to don't generate the long NIC names in the first place.
  • investigate why names such as enp175s0f0np0 are generated in the first place. We have plenty of servers with names like eno50 and @jbond suspects there may be a PCI database involved somewhere.
  • explore options to handle server network configuration using systemd-networkd via puppet.

The options are sorted by complexity. So we will try the first option first.

We also agreed on not making this task a blocker for T319184: Move WMCS servers to 1 single NIC.

Change 838761 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] P:openstack::base::neutron: dont use the legacy naming

https://gerrit.wikimedia.org/r/838761

@jbond and me just had a meeting on this.

The last time we had a similar problem was in T209707: tagged_interface sometimes exceeds IFNAMSIZ and in particular for WMCS servers we patched them by switching to a different vlan tagging model: https://gerrit.wikimedia.org/r/c/operations/puppet/+/508796
However, we only did that for a secondary interface, and the solution is basically an interface rename, so we're back to the origin of the problem: how to rename a primary interface once a server has been reimaged.

We see 4 potential solutions to this problem:

  • extend interface::rename to also update /etc/network/interfaces. Imagine something like sed s/oldname/newname/g or a similar hack.

I took a look at this and it seemed way to delicate to go down this route without covering all corner cases

  • explore options to configure the debian installer to don't generate the long NIC names in the first place.

We could definitely update systemd udev discovery to use the old naming method, however this would come with the same issues that the old method had i.e. potential renaming of interfaces when hardware changes

  • investigate why names such as enp175s0f0np0 are generated in the first place. We have plenty of servers with names like eno50 and @jbond suspects there may be a PCI database involved somewhere.

Looking into this i think that the name is correct based on ID_NET_NAME_PATH where
enp175: prefix
s0: slot 0
f0: function/index number
np0: nphys_port_name

  • explore options to handle server network configuration using systemd-networkd via puppet.

We should go this route eventually but it will take some time to migrate, see T234207

The last time we had a similar problem was in T209707: tagged_interface sometimes exceeds IFNAMSIZ and in particular for WMCS servers we patched them by switching to a different vlan tagging model: https://gerrit.wikimedia.org/r/c/operations/puppet/+/508796

This task also introduced a new way of naming vlans by setting legacy_vlan_naming: false which allows us to name vlans like vlan1105. this means the new vlan name is still a valid name and as such should work.

Change 838761 merged by Jbond:

[operations/puppet@production] P:openstack::base::neutron: dont use the legacy naming

https://gerrit.wikimedia.org/r/838761

Change 838771 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] openstack: neutron: l3_agent: more support for new vlan naming

https://gerrit.wikimedia.org/r/838771

Change 838771 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] openstack: neutron: l3_agent: more support for new vlan naming

https://gerrit.wikimedia.org/r/838771

aborrero claimed this task.

Thanks @jbond for the priceless assistance.

Change 838786 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudnet1006: don't use legacy naming for vlan NICs

https://gerrit.wikimedia.org/r/838786

Change 838786 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudnet1006: don't use legacy naming for vlan NICs

https://gerrit.wikimedia.org/r/838786