As more and more tools move to Build Service based images, we should provide some Kubernetes workers without NFS volumes mounted.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Add worker-nfs Toolforge Kubernetes role/prefix | cloud/wmcs-cookbooks | main | +9 -4 |
Title | Reference | Author | Source Branch | Dest Branch | |
---|---|---|---|---|---|
jobs-api: bump to 0.0.263-20240222104806-5ddd710f | repos/cloud/toolforge/toolforge-deploy!206 | project_1317_bot_df3177307bed93c3f34e421e26c86e38 | bump_jobs-api | main | |
deployment: Pin jobs-api pod to NFS-enabled workers | repos/cloud/toolforge/jobs-api!62 | taavi | main-I0b1d43e2a173b39f145bbcbf5142ed169b9ea259 | main |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | taavi | T355883 Create a pool of NFS-less Toolforge Kubernetes workers | |||
Resolved | taavi | T284656 Toolforge k8s: Migrate workers to Containerd and Bookworm | |||
Resolved | taavi | T349795 Upgrade cadvisor | |||
Resolved | taavi | T350227 toolforge prometheus servers OOMing | |||
Resolved | taavi | T357901 Request increased server-group-members quota for tools | |||
Resolved | aborrero | T358476 toolforge k8s: some static pods needs manual restart | |||
In Progress | Raymond_Ndibe | T358203 [k8s] Add node anti-affinity topologySpreadConstraints to infrastructure components where relevant |
Event Timeline
Change 992925 had a related patch set uploaded (by Majavah; author: Majavah):
[cloud/wmcs-cookbooks@main] Add worker-nfs Toolforge Kubernetes role/prefix
Change 992925 merged by jenkins-bot:
[cloud/wmcs-cookbooks@main] Add worker-nfs Toolforge Kubernetes role/prefix
taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/62
deployment: Pin jobs-api pod to NFS-enabled workers
project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/206
jobs-api: bump to 0.0.263-20240222104806-5ddd710f
taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/206
jobs-api: bump to 0.0.263-20240222104806-5ddd710f
So I added three non-NFS workers, tools-k8s-worker-102 to 104. So far they're being used by various infrastructure things, buildservice image-build pods, and a few tools with buildservice images. That's roughly what I'd expect, especially with only a few evictions from the NFS nodes this morning.
There was an issue with jobs-api this morning where it did not specify nodeSelector. In addition I filed T358203: [k8s] Add node anti-affinity topologySpreadConstraints to infrastructure components where relevant. Otherwise I think I'm pretty happy with how this turned out. We should re-visit the size of this pool after the next time we've had to restart all the NFS workers.