Page MenuHomePhabricator

Configure the soft anti-affinity (and presumably the soft affinity) server policy
Closed, ResolvedPublic

Description

I was looking at creating a bunch of k8s workers using "soft anti-affinity" instead of typical anti-affinity because workers should schedule wherever possible, but with as great a spread as available, right? Also we have that policy exposed in Horizon.

That said, it's not quite configured apparently. I got this error message:

ServerGroupSoftAntiAffinityWeigher not configured

Seems both solvable and a nifty feature to unlock for Toolforge and friends in the future.

Event Timeline

Bstorm created this task.May 20 2020, 10:16 PM
Bstorm triaged this task as Low priority.Jun 2 2020, 4:18 PM

I briefly looked into this.

The docs at https://docs.openstack.org/python-openstackclient/rocky/cli/command-objects/server-group.html read Specify –os-compute-api-version 2.15 or higher for the ‘soft-affinity’ or ‘soft-anti-affinity’ policy.

So it seems we need a compute API with version >= 2.15, but apparently we have:

| eqiad1-r | nova         | compute      | True    | public    | http://openstack.eqiad1.wikimediacloud.org:8774/v2.1
bd808 edited projects, added Horizon; removed Cloud-VPS.Jun 16 2020, 4:39 PM
bd808 added a subscriber: bd808.

@Andrew can you take a quick look to see if we can disable the UI feature as it appears our control plane does not support this yet?

Andrew claimed this task.Jun 17 2020, 2:56 PM

Change 607825 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Openstack Nova: enable soft affinity (and soft anti-affinity) server groups

https://gerrit.wikimedia.org/r/607825

api versions are weird in nova... 2.1 is the version but there are later 'microversions' that can be requested specifically via http headers. We support up to microversion 2.65. It looks to me like horizon is properly requesting a higher microversion, so we shouldn't have API issues with using this feature.

You can turn scheduling filters on and off in nova.conf, and it looks like the soft affinity filters are currently turned off. Attached patch should resolve that.

Change 607825 merged by Andrew Bogott:
[operations/puppet@production] Openstack Nova: enable soft affinity (and soft anti-affinity) server groups

https://gerrit.wikimedia.org/r/607825

Change 607827 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Openstack Nova: enable soft affinity (and soft anti-affinity) server groups

https://gerrit.wikimedia.org/r/607827

Change 607827 merged by Andrew Bogott:
[operations/puppet@production] Openstack Nova: enable soft affinity (and soft anti-affinity) server groups

https://gerrit.wikimedia.org/r/607827

Andrew reassigned this task from Andrew to Bstorm.Jun 25 2020, 4:23 PM

@Bstorm try now? I did a quick test and it seems to be working (at least with 3 VMs it put them on three different hosts.)

Will do! This might be cool for worker-type VMs.

Mentioned in SAL (#wikimedia-cloud) [2020-06-25T22:52:04Z] <bstorm> created paws-k8s-worker-5/6/7 as x-large nodes to bring the cluster up to roughly the same capacity as the existing one using soft anti-affinity T211096 T253267

Bstorm closed this task as Resolved.Jun 25 2020, 10:54 PM

Worked for my new workers. I presume that it will also place them together when the capacity gets more spread tighter (thus the "soft"), which is great. I'm imagining soft anti-affinity is ideal for worker node-type things and might deprecate the spread monitor.