Tool Name: wd-shex-infer
Quota Increase Requested: limits.memory 10Gi, requests.memory 8Gi
Reason: The Grid Engine version of the tool creates jobs with -mem 8g, and if memory serves, the jobs can actually require that much memory (i.e., I don’t think I just randomly picked that number). For feature parity, I’d like to be able to create Toolforge jobs with the same amount of memory, but the current quota is limits.memory 8Gi (some of which is taken up by the webservice already) and requests.memory 4Gi.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | LucasWerkmeister | T320140 Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes | |||
Resolved | taavi | T357209 Request increased memory quota for wd-shex-infer Toolforge tool | |||
Declined | None | T357881 [maintain-kubeusers] Allow setting the requests cpu and mem quota |
Event Timeline
Not yet no, feel free to try to create a cookbook :), though it's managed through commits to the maintain-kubeusers repo:
https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Quota_management
dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/197
maintain-kubeusers: increase quota for wd-shex-infer
Done:
root@tools-k8s-control-6:~# kubectl -n tool-wd-shex-infer get resourcequotas tool-wd-shex-infer -o json | jq '.spec.hard."limits.memory"' "10Gi"
requests.memory is now set to 5 Gi, rather than 8 Gi as I requested. Is this intentional?
The limit also isn’t working properly yet; from kubectl get events:
15s Warning FailedCreate job/wd-shex-infer-101 Error creating: pods "wd-shex-infer-101-b25wv" is forbidden: maximum memory usage per Container is 6Gi, but limit is 8G
(I think I’ll just kubectl edit this job to unstuck it and test that the rest of T320140 works, but it would be nice to have this working in general.)
Meh, doesn’t work, Kubernetes complained that the memory limit is immutable (if I understood the error message correctly).
I guess I also need the limitrange increased? At least I can see a 6Gi max there.
tools.wd-shex-infer@tools-sgebastion-10:~/www/python/src$ kubectl describe limitrange Name: tool-wd-shex-infer Namespace: tool-wd-shex-infer Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio ---- -------- --- --- --------------- ------------- ----------------------- Container cpu 50m 3 250m 500m - Container memory 100Mi 6Gi 256Mi 512Mi -
(I’m leaving the job alive for now, by the way, and hope that it can successfully run once the limitrange has been increased.)
https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/197 was never merged. How did this happen?
dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/197
maintain-kubeusers: increase quota for wd-shex-infer
dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/198
wd-shex-infer: update also the limitrange
Yep, sorry about that, we don't usually increase the limit range (though maybe we should :/, feels weird limiting).
For the request.memory value, we currently set it to half the memory, I'll have to change our quota managment scripts to allow setting it to something different.
dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/198
wd-shex-infer: update also the limitrange
Updated the limitrange:
root@tools-k8s-control-6:~# kubectl -n tool-wd-shex-infer get limitrange tool-wd-shex-infer -o json | jq '.spec.limits[].max.memory' "10Gi"
For the requests, you can work-around the default by passing the --mem/--cpu when creating the jobs.
dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/12
quota: allow overriding the requests.cpu and memory
project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/199
maintain-kubeusers: bump to 0.0.121-20240219092902-759465a7
dcaro closed https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/199
maintain-kubeusers: bump to 0.0.121-20240219092902-759465a7
Mentioned in SAL (#wikimedia-cloud) [2024-03-02T12:06:32Z] <wmbot~lucaswerkmeister@tools-sgebastion-10> update config.yaml: increase job limits.memory from 6G to 8G, should be possible now (T357209)
It’s mostly working for me, but I’d still like to be able to set requests.memory higher than at the moment (which is blocked on T357881 if I understand correctly).
taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/218
maintain-kubeusers: Bump memory for wd-shex-infer
taavi closed https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/12
quota: allow overriding the requests.cpu and memory
taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/218
maintain-kubeusers: Bump memory for wd-shex-infer
starting a run Update quota for tool wd-shex-infer from version '2-T357209-2' to version '2-T357209-3' finished run, wrote 0 new accounts, disabled 0 accounts, cleaned up 0 accounts, renewed 0 accounts, updated 1 quotas
This is live now. Sorry for the delay.
Mentioned in SAL (#wikimedia-cloud) [2024-03-08T15:15:54Z] <wmbot~lucaswerkmeister@tools-sgebastion-10> bump requests.memory to 8G (T357209 / T320140)