Page MenuHomePhabricator

Add new k8s toolforge workers to cater for memory requests
Closed, ResolvedPublic

Description

As per parent, Toolforge k8s memory requests are over the 80% threshold, as an immediate and easy measure we can expand the cluster with new workers to make sure pods can be scheduled

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
flavors: add g4.cores8.ram32.disk20.ephem140repos/cloud/cloud-vps/tofu-infra!301filippobug/T419824main
Customize query in GitLab

Event Timeline

Following the docs at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Add_a_worker this is what I'm planning on running, then wait for completion, observe memory requests percentage at https://grafana.wmcloud.org/goto/cffsdds3j2juod?orgId=1 and repeat as needed to bring reservation % to say 70%

cookbook wmcs.toolforge.add_k8s_node --cluster-name tools --role worker_nfs

I'm also assuming that the most requests come from nfs workers above, to be verified once a nfs worker is added and how it changes memory requests %

Following the docs at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Building_new_nodes this is what I'm planning on running, then wait for completion, observe memory requests percentage at https://grafana.wmcloud.org/goto/cffsdds3j2juod?orgId=1 and repeat as needed to bring reservation % to say 70%

cookbook wmcs.toolforge.add_k8s_node --cluster-name tools --role worker_nfs

LGTM, we can use a higher memory image for the new nodes (2x should be ok imo)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-03-18T07:23:26Z] <filippo@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster (T419824)

fnegri changed the task status from Open to In Progress.Mar 18 2026, 3:10 PM
fnegri triaged this task as High priority.
fgiunchedi claimed this task.

This is done, however 32GB barely made a dent into the % requests vs available. Resolving and will followup in parent task.