We introduced {T335759}This task started life in reference only to cloud hosts, but we have additional server configurations which require similar manual work currently, so I've widened the scope to include those. The cloud use-case is most pressing, but currently one of the leftovers that bites us from time to time is the lack of automation in the netbox server network provisioning automationso will be addressed first.
* Every time we run**Issue**
Currently we have a variety of hosts which require manual changes in Netbox after the [[ https://netbox.wikimedia.org/extras/scripts/interface_automation.ProvisionServerNetwork/ | Provision a server's network attributes ]] netbox scriptprovision script ]] has been run the create the Netbox link to top-of-rack switch. Typically these involve additional vlans being trunked on the switch side, and in some cases additional IP allocations and links being created.
**Plan**
This task will track progress to updating the provision script to allow for these type of hosts to be selected by DC-Ops at provision stage. The script should then ensure that switch ports are configured as type "trunk" from day one, and the required vlans are configured. Additionally the script should make any additional IP allocations, and create any secondary host-side interfaces as needed. This helps the effort to {T347411}, by reducing our current reliance on the PuppetDB import script to record these elements in Netbox.
The updated script should remove any need for manual changes in Netbox for these hosts, ideally it should contain some kind of checkbox to indicate that the host must be given a `private.{eqiad,codfw}and ensure the process can be handled by DC-Ops alone.
**Cloud Hosts**
Cloud hosts need to have their network configured as described [[ https://wikitech.wikimedia.cloud` address from the affected rack.
* Alsoorg/wiki/Portal:Cloud_VPS/Admin/Network#Cloud_Hosts | here ]].
It should be fairly straightforward to modify the provision script to allow for these different types.
**Ganeti**
We are currently investigating how to make Ganeti VMs function in routed mode (see T300152), which may mean we no longer need any custom switch configuration for those servers.
In the existing setup, however, Ganeti hosts need to have the local 'public' and 'analytics' vlans trunked to them, with the local 'private' network being untagged (native) vlan on the link. On the host-side the server's IP in the private vlan needs to be provisioned on a bridge called 'private', which the physical uplink is made a member of. Vlan sub-interfaces for the other vlans need to be created on the host, and made members of other bridges (named 'public' and 'analytics') respectively.
**LVS**
We are currently trialling a new L4LB (see T332027) that will only require connection to the local private vlan, so similar to Ganeti a non-standard switch config may not be needed for these long term. In the meantime LVS servers retain quite a few non-standard elements.
The LVS primary port should have the local private vlan trunked untagged to it. All other local vlans (public/analytics) need to be trunked to the primary port. For EVPN switches this should include all per-rack vlans configured on participating switches.
The LVS also needs layer-2 adjacencies to remote rows / vlans, so IPVS can send requests to real servers on those networks. This connectivity is provided using a separate physical link, which terminates on the spine-layer devices for a given rack (be they in a virtual-chassis or EVPN setup). This link has no native vlan configured and trunks all vlans for the particular row/location to the LVS.
On the LVS side the servers need vlan sub-interfaces created for all tagged vlans, and an additional IP on each allocated (currently v4 only, cloud hosts will generally need the cloud-private vlan of their rack trunked in the switch portbut perhaps that should be reviewed - see T336505).