Page MenuHomePhabricator

Add some more k8s worker nodes to Toolforge at the xlarge size
Closed, ResolvedPublic

Description

Some workers are a bit over-burdened in the cluster at this point: https://grafana-labs.wikimedia.org/d/000000004/tools-activity?panelId=2&fullscreen&orgId=1

We should avoid > 25 pods on a node. A few more nodes at the larger planned size (introducing a server group with soft anti-affinity policy to slowly replace all nodes into) should be just the thing.

Then drain the overburdened nodes to redistribute things.

Event Timeline

Bstorm created this task.

@bd808 suggested possibly using a custom flavor for ks8 workers because we tend to ride at low CPU for most of our apps. That may change as we roll out a jobs service, but so far it is very true.

Mentioned in SAL (#wikimedia-cloud) [2020-07-22T23:24:55Z] <bstorm> created server group 'tools-k8s-worker' to create any new worker nodes in so that they have a low chance of being scheduled together by openstack unless it is necessary T258663

Mentioned in SAL (#wikimedia-cloud) [2020-07-30T16:28:47Z] <andrewbogott> added new xlarge ceph-hosted worker nodes: tools-k8s-worker-61, 62, 63, 64, 65, 66. T258663