So once again thinking through how we set up the networking for hosts I thought about a workflow which might improve on the current scenario, while still using the PuppetDB import script to tackle the naming problem.
- We complete the work to update our Provision script when adding server<->switch connections in netbox (T346428)
- Mostly doing what it does now
- But also adding elements on the host for our more complex network setups (ganeti, cloud)
- Also updating the switch port trunk vlans where needed
- We make a new, simpler PuppetDB import script
- similar to the existing, but that only updates physical int names in Netbox
- i.e. we don't import IP addresses, additional vlan interfaces, bridge devices or anything else
- and we don't update the switch config (trunk vlans etc) in Netbox
- We create a new cookbook, called sre.network.update-host
- This should first trigger the new PuppetDB import script, to make sure names in Netbox are correct
- It should then generate a full /etc/network/interfaces file for the host
- It should push this to the host, comparing it to the exsiting e/n/i file
- If the new file has changes it should prompt the user to reboot the host (so it reboots with additional network setup)
- The reimage cookbook mostly does what it does now
- Sets up the dhcp config to allow the system to get an IP from DHCP at PXE and d-i stage
- The debian installer does it's thing and creates a static /etc/network/interfaces file with the config it got from dhcp
- The installer completes, system reboots
- We wait for first puppetdb run (populating puppetdb with netdev names)
- We trigger the new sre.network.update-host cookbook at this point
- Mostly this has no effect, other than triggering the puppetdb import to update int names in Netbox
- For the complex hosts it'll change their /e/n/i file and prompt user to do another reboot
Benefits
What appeals to me about sre.network.update-host as a separate cookbook is it has a simple function - update the host network conf file based on netbox - and can be run anytime. But we can trigger it at reimage to ensure we add any additional bits we don't get from the DHCP/D-I step now. This will relieve other SRE teams from all their complex meddling with network config using puppet, and mean we are driving all the host network config from netbox.
Anyway just an idea. I guess the basic suggestion is to put the "interface name" conundrum to one side for now, but still make improvements. Then if we can solve the interface name bit we can remove the pupperdb import renaming but it's more straightforward.