Page MenuHomePhabricator

Add some more k8s worker nodes to Toolforge at the xlarge size
Open, MediumPublic

Description

Some workers are a bit over-burdened in the cluster at this point: https://grafana-labs.wikimedia.org/d/000000004/tools-activity?panelId=2&fullscreen&orgId=1

We should avoid > 25 pods on a node. A few more nodes at the larger planned size (introducing a server group with soft anti-affinity policy to slowly replace all nodes into) should be just the thing.

Then drain the overburdened nodes to redistribute things.

Event Timeline

Bstorm triaged this task as Medium priority.Jul 22 2020, 10:47 PM
Bstorm created this task.

@bd808 suggested possibly using a custom flavor for ks8 workers because we tend to ride at low CPU for most of our apps. That may change as we roll out a jobs service, but so far it is very true.

Mentioned in SAL (#wikimedia-cloud) [2020-07-22T23:24:55Z] <bstorm> created server group 'tools-k8s-worker' to create any new worker nodes in so that they have a low chance of being scheduled together by openstack unless it is necessary T258663

Mentioned in SAL (#wikimedia-cloud) [2020-07-30T16:28:47Z] <andrewbogott> added new xlarge ceph-hosted worker nodes: tools-k8s-worker-61, 62, 63, 64, 65, 66. T258663