Background
Our current host-network configuration / provisioning has some rough edges. To give a quick recap on how things currently work:
- Server's are assigned an IP in Netbox (with a dummy interface name) when their primary switch link is added to netbox
- We assign this IP to the host using DHCP when it is being reimaged
- The debian-installer takes the IP details assigned from DHCP and statically configures them in /etc/network/interfaces
- Various roles with more complex network setups use puppet to modify this file
- For example cloud hosts, lvs, ganeti
- They typically only augment the d-i generated file, rather than replace it, which is brittle
- The information from puppetdb is imported back into Netbox
- This replaces dummy interface names with the host-derived real ones
- It also adds any additional interfaces and IPs that were created by puppet to Netbox.
This process, combined with use of ifupdown, leads to multiple issues as documented in T234207: Investigate improvements to how puppet manages network interfaces and other tasks. Apologies for yet another one but this didn't seem to fit perfectly in the existing ones.
Future
Moving forward I think it makes sense if we use Netbox to drive the host configuration, really making it our "source of truth". Netbox now models the more elaborate host-side network elements correctly, so there should be no need for any role-specific puppet classes that modify network config, or hiera structures to define these elements. Working to that end also provides a perfect opportunity to move away from ifupdown, towards systemd-networkd, netplan.io, connman or similar.
At a high-level we'd need some changes to our process:
- Improve the netbox provision script, so we can define more complex setups at that stage
- i.e. have options for cloud hosts, ganeti, lvs (see T346428)
- Keep the current scenario whereby we assign IPs from DHCP during install, and d-i creates a conf file based on those details
- When the host is up, Puppet overwrites the d-i generated files, rewriting the whole config based on data synced from netbox
Interface naming problem
There are some challenges for this last point. They boil down to Netbox having the interface config and IP details, but not knowing the linux device names. The host itself knows the interface names, but does not have access to Netbox. Puppet sort of sits in between.
There are probably various ways to approach:
- Can we accurately predict the linux netdev names?
- I don't believe we track the various host-level NIC and BIOS parameters accurately enough to predict the PCIe numbering
- Could we have some local service on the box that builds the network conf files based on data puppet pushes to it?
- This data would be synced from netbox
- The local service would be able to replace any 'dummy' interface names with the real ones
- Possibly based on lldp info?
We'd probably still need the puppetdb netbox import step, to rename dummy interfaces in Netbox. But we could reduce its function to just renaming dummy interface names, and no longer have any other data being pulled from live hosts into our source of truth.
Not a trivial thing, but thought I'd create the task to explore what our options are.