Add some more k8s worker nodes to Toolforge at the xlarge size
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	• Bstorm
	Jul 22 2020, 10:47 PM

Description

Some workers are a bit over-burdened in the cluster at this point: https://grafana-labs.wikimedia.org/d/000000004/tools-activity?panelId=2&fullscreen&orgId=1

We should avoid > 25 pods on a node. A few more nodes at the larger planned size (introducing a server group with soft anti-affinity policy to slowly replace all nodes into) should be just the thing.

Then drain the overburdened nodes to redistribute things.

Event Timeline

• Bstorm triaged this task as Medium priority.Jul 22 2020, 10:47 PM

• Bstorm created this task.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 22 2020, 10:47 PM

@bd808 suggested possibly using a custom flavor for ks8 workers because we tend to ride at low CPU for most of our apps. That may change as we roll out a jobs service, but so far it is very true.

Mentioned in SAL (#wikimedia-cloud) [2020-07-22T23:24:55Z] <bstorm> created server group 'tools-k8s-worker' to create any new worker nodes in so that they have a low chance of being scheduled together by openstack unless it is necessary T258663

Mentioned in SAL (#wikimedia-cloud) [2020-07-30T16:28:47Z] <andrewbogott> added new xlarge ceph-hosted worker nodes: tools-k8s-worker-61, 62, 63, 64, 65, 66. T258663

Nintendofan885 subscribed.Jul 30 2020, 8:35 PM

fnegri edited projects, added cloud-services-team; removed cloud-services-team (Kanban).Jan 18 2023, 7:13 PM

fnegri moved this task from Kanban to Inbox on the cloud-services-team board.

taavi closed this task as Resolved.Feb 11 2023, 10:23 PM

Add some more k8s worker nodes to Toolforge at the xlarge sizeClosed, ResolvedPublicActions

Description

Event Timeline

Add some more k8s worker nodes to Toolforge at the xlarge size
Closed, ResolvedPublic
Actions