As per parent, Toolforge k8s memory requests are over the 80% threshold, as an immediate and easy measure we can expand the cluster with new workers to make sure pods can be scheduled
Description
Details
| Title | Reference | Author | Source Branch | Dest Branch | |
|---|---|---|---|---|---|
| flavors: add g4.cores8.ram32.disk20.ephem140 | repos/cloud/cloud-vps/tofu-infra!301 | filippo | bug/T419824 | main |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T414513 Add new alerts for Toolforge cluster high load | |||
| Resolved | fgiunchedi | T419824 Add new k8s toolforge workers to cater for memory requests |
Event Timeline
Following the docs at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Add_a_worker this is what I'm planning on running, then wait for completion, observe memory requests percentage at https://grafana.wmcloud.org/goto/cffsdds3j2juod?orgId=1 and repeat as needed to bring reservation % to say 70%
cookbook wmcs.toolforge.add_k8s_node --cluster-name tools --role worker_nfs
I'm also assuming that the most requests come from nfs workers above, to be verified once a nfs worker is added and how it changes memory requests %
filippo opened https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/301
flavors: add g4.cores8.ram32.disk20.ephem140
filippo merged https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/301
flavors: add g4.cores8.ram32.disk20.ephem140
Mentioned in SAL (#wikimedia-cloud-feed) [2026-03-18T07:23:26Z] <filippo@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster (T419824)
This is done, however 32GB barely made a dent into the % requests vs available. Resolving and will followup in parent task.