Problem
Per T364725: Migrate Cloud VPS instances to VXLAN based networks, we need to migrate virtual machines from the old VLAN-based subnet to the new VXLAN-based subnet (which includes IPv6).
There are, however, different ways in which this can be done, depending on a number of factors, such as:
- how much effort we want to put into it
- how fast we want the migration to happen
- what level of disruption is acceptable for our users
- how confident we are that everything will just "work", i.e, Toolforge migrating to IPv6, there be dragons
Constraints and risks
Migrating a virtual machine to the new network requires downtime, either:
- a reboot with a new neutron port
- the VM is completely new
Also, given there is a new IP address, it involves DNS changes.
Options
Option 1
Based on VM migration. Triggered by WMCS team with no projects admin intervention.
Write a script that takes a VM and 'moves' it to the new network setup.
Pros:
- can be effective in completing the migration somewhat "fast"
Cons:
- crafting the script can be a costly task (in terms of engineering time)
- it may involve introducing artificial downtime for user VMs
- it may involve modifying the VM filesystem, which sounds scary
- it is less "clean" compared to option 2
- risk of introducing IPv6 without control for systems that may break if not ready
Option 2
Based on VM rebuilds. Triggered by projects admin on a self-service fashion.
If a VM needs to move to the new network setup, it needs to be rebuilt. This is executed as a self-service thing via normal user workflows (i.e, horizon, tofu) from users.
We could start with 3 network definitions in neutrons, available via horizon:
- VLAN/legacy
- VXLAN/IPv4-only
- VXLAN/IPv6-dualstack
Then we could have a migration timeline similar to this:
- 2024-12-01: announcement about the transition. 3 network options available in horizon
- 2025-02-01: (2 months later) option to create VMs in VLAN/legacy is disabled in horizon. Just VXLAN/IPv4-only or VXLAN/IPv6-dualstack remain available in horizon.
- [ .. from this point on the migration is progressing organically .. ]
- 2025-12-01: (1 year later) we evaluate how the migration is progressing, and maybe automate some of if with a script if we need to accelerate it.
- 2026-12-01: (2 years later) we expect no VMs in the legacy VLAN to exist. If some exist, we will evaluate what to do.
- 20XX-XX-XX: (at some point TBD) we may want to disable VXLAN/IPv4-only VM creation options, or keep it only for special cases upon requests.
Pros:
- no additional engineering time required from WMCS to invest in migration scripts and such
- no artificial downtime. A project admin explicitly created a new virtual machine via horizon. Clean.
- the introduction of the new IPv6 is fully in control of the project admin
- a shiny new IPv6 may be a good incentive for users to do the migration soon.
Cons:
- not automated, requires project admin intervention. We require actions from the community
- will delay completion of the network migration
Option 3
Mixed approach. Focus on the self-service VM rebuild approach, but create a script to handle some other complex cases.