Neutron is here!

As promised in an earlier post (Blog Post: Neutron is (finally) coming), we've started moving a few projects on our Cloud-VPS service into a new OpenStack region that is using Neutron for its software-defined networking layer. It's going pretty well! The new region, 'eqiad1', is currently very small, and growth is currently blocked by hardware issues (see T199125 for details) but we hope to resolve that issue soon.

Once we have some more hardware allocated to the eqiad1 region we will start migrating projects in earnest. Here's what that will look like for each project as it is migrated:

  1. A warning email about impending migrations will be sent to the cloud-announce mailing list at least 7 days before migration.
  2. On the day of the migration: Instance creation for each migrating project will be disabled in the legacy 'eqiad' region. This means that Horizon will still show instances in eqiad, but creation of new instances will be disabled there.
  3. The current project quotas will be copied over from eqiad to eqiad1.
  4. Security groups will be copied from eqiad to eqiad1, and some rules (those that refer to or 'all VMs everywhere') will be duplicated to include the new IP range in eqiad1.
  5. Then, the following will happen to each instance:
    1. The instance will be shut down
    2. A new shadow instance will be created in eqiad1 with the same name but a new IP address or addresses.
    3. The contents of the eqiad instance will be copied into the new instance. This step could take several hours, depending on the size of the instance.
    4. Any DNS records or proxies that pointed to the old instance will be updated to point at the new instance.
    5. The new instance will be started up, and then rebooted once for good measure.
    6. Once the new instance is confirmed up and reachable, the old instance will be deleted.
  6. You will want to check some things afterwards. In particular:
    1. Verify that any external-facing services supported by your project are still working. If they need to be started, start them. If something drastic is happening, notify WMCS staff on IRC (#wikimedia-cloud)
    2. In some cases you may need to restart services if they're unable to restart themselves after a system reboot. For example, Wikimedia-Vagrant seems to usually have this problem.

If you would like an early jump on migration, we have space to move a few projects now. In particular, if you would like access to the eqiad1 region so that you can start building out new servers there, please open a quota request here: https://phabricator.wikimedia.org/project/profile/2880/

The migration process for Toolforge will be utterly different -- in the meantime people who only use Toolforge can disregard all of this for the time being.

Written by Andrew on Sep 27 2018, 3:18 PM.

Event Timeline