Page MenuHomePhabricator

Consider replacing our spreadcheck alerts with Server Groups Anti-Affinity policies
Open, Needs TriagePublic

Description

May want to investigate how it interacts with our processes around evacuating hosts and things - e.g. it's not a complete solution if someone can/will still migrate instances to hypervisors in violation of a server group policy.
I've noticed @JHedden has already created one of these for tools-elastic instances, though there are no instances in it (yet?)

Event Timeline

Host aggregates are currently broken on CloudVPS due to the version of oslo.versionedobjects. The Queens upgrade should fix it, and as you noted will provide better scheduling placement when migrating instances.

Queens version python-oslo.versionedobjects 1.31.2-2~bpo9+1
Pike version python-oslo.versionedobjects 1.17.0-2

Nova is trying to use the __add__ method on InstanceList, which doesn't work without this commit https://github.com/openstack/oslo.versionedobjects/commit/f3519480d04ae1e7f52c8e3b5ff7e0c6678d2da6

We good to go with this given the Queens upgrade?

Server groups are working now and have been re-enabled in Horizon https://gerrit.wikimedia.org/r/c/openstack/horizon/horizon/+/585802

The spread check might still be useful to keep around though. Since we're currently not using the native OpenStack migration process there's still a good chance we could have stacked instances. Once we migrate to shared storage and start using the native process the Nova scheduler should take care of all this for us.

I assume our process involves trying to tell a new Nova host to schedule some particular VM and providing it the files.
I guess if Nova will refuse to schedule a VM that would be in violation that's fine, as long as our process involves checking for that and finding an appropriate place for it (instead of just leaving it turned off).

Change 979056 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] openstack: spreadcheck: remove in favour of server groups

https://gerrit.wikimedia.org/r/979056

Change 979056 merged by Majavah:

[operations/puppet@production] openstack: spreadcheck: remove in favour of server groups

https://gerrit.wikimedia.org/r/979056