Neutron is (finally) coming

The History

When Wikimedia Labs (the umbrella-project now known as 'Cloud VPS') first opened to the public in 2012 it was built around OpenStack Nova version 'Diablo'.[1] Nova included a simple network component ("nova-network") which works pretty well -- it assigns addresses to new VMs, creates network bridges so that they can talk to the outside internet, and manages dynamic firewalls that control which VMs can talk to each other and how.

Just as we were settling into nova-network (along with other early OpenStack adopters), the core developers were already moving on. A new project (originally named 'Quantum' but eventually renamed 'Neutron') would provide stand-alone APIs, independent from the Nova APIs, to construct all manners of software-defined networks. With every release Neutron became more elaborate and more reliable, and became the standard for networking in new OpenStack clouds.

For early adopters like us, there was a problem. The long-promised migration path for existing nova-network users never materialized, and nova-network got stuck in a kind of support limbo: in version after version it was announced that it would be deprecated in the next release, but nova-network users always pushed back to delay the deprecation until an upgrade path was ready. Finally, in late 2016 nova-network was finally dropped from support, but still with no well-agreed-on upgrade path.

So, after years of foot-dragging, we need to migrate (T167293) our network layer to Neutron. It's going to be painful!

The Plan

Since there is not an in-place upgrade path, Chase and Arturo have built a new, parallel nova region using Neutron that is named 'eqiad1-r'. It shares the same identity, image, and DNS service as the existing region, but instances in the eqiad1-r region live on different hosts and are in a different VLAN with different IPs. I (Andrew) will be pulling projects, one at a time, out of the existing 'eqiad' region and copying everything into 'eqiad1-r'. Each instance will be shut down in eqiad, copied to eqiad1-r, and started up again. The main disruption here is that once moved, the new VMs will have a new IP address and will probably be unable to communicate with VMs in the old region; for this reason, project migration will mean substantial, multi-hour downtime for the entire VPS project.

Here are a few things that will be disrupted by IP reassignment:

  • Internal instance DNS (e.g. <instance>.<project>.eqiad.wmflabs)
  • External floating-IP DNS (e.g. <website>.<project>
  • Dynamic web proxies (e.g. http://<website>>)
  • Nova security group rules
  • Anything at all internal to a project that refers to another instance by IP address

I'm in the process of writing scripted transformations for almost all of the above. Ideally when a VM moves, the DNS and proxy entries will be updated automatically so that all user-facing services will resume as before the migration. The one thing I cannot fix is literal IP references within a project; if you have any of those, now would be a good time to replace those with DNS lookups, or at the very least, brace yourself for a lot of hurried clean-up.

Once we've run through a few trial migrations, I'll start scheduling upgrade windows and coordinating with project admins. We'll probably migrate Toolforge last -- there are even more issues involved with that move which I won't describe here.

This is another technical-debt/cleanup exercise that doesn't really get us anything new in the short-run. Moving to Neutron clears the path for an eventual adoption of IPv6, and Neutron has the potential to support new, isolated testing environments with custom network setups. Most importantly however, this will get us back on track with OpenStack upgrades so that we can keep getting upstream security fixes and new features. Expect to hear more about those upgrades once the dust settles from this migration.

The Timeline

Honestly, I don't know what the timeline is yet. There are several projects that are wholly managed by Cloud Services or WMF staff, and those projects will be used as the initial test subjects. Once we have an idea of how well this works and how long it takes, we'll start scheduling other projects for migration in batches. Keep an eye on the cloud-announce mailing list for related announcements.

How you can help

  • Fix any literal IP references within your project(s). Replace them with DNS lookups. If they can't be replaced with lookups, make a list of everywhere that they appear and get ready to edit all those places come migration day
  • Delete VMs that you aren't using. Release floating IPs that you aren't using. Delete Proxies that aren't doing anything. The fewer things there are to migrate, the easier this will be.

Thank you!

[1] OpenStack releases are alphabetical, two per year. The current development version is Rocky (released late 2018); WMCS is currently running version Mitaka (released early 2016) and Neutron was first released as part of version Folsom (late 2012). So this has been a long time coming.

Written by Andrew on Aug 22 2018, 9:00 PM.
"Mountain of Wealth" token, awarded by D3r1ck01."Mountain of Wealth" token, awarded by Krenair."Love" token, awarded by chasemp.