Request increased memory quota for wd-shex-infer Toolforge tool
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	LucasWerkmeister
	Feb 10 2024, 4:44 PM

Description

Tool Name: wd-shex-infer
Quota Increase Requested: limits.memory 10Gi, requests.memory 8Gi
Reason: The Grid Engine version of the tool creates jobs with -mem 8g, and if memory serves, the jobs can actually require that much memory (i.e., I don’t think I just randomly picked that number). For feature parity, I’d like to be able to create Toolforge jobs with the same amount of memory, but the current quota is limits.memory 8Gi (some of which is taken up by the webservice already) and requests.memory 4Gi.

Related Objects
Search...

Status	Assigned	Task
Resolved	LucasWerkmeister	T320140 Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes
Resolved	taavi	T357209 Request increased memory quota for wd-shex-infer Toolforge tool
Declined	None	T357881 [maintain-kubeusers] Allow setting the requests cpu and mem quota

Event Timeline

LucasWerkmeister created this task.Feb 10 2024, 4:44 PM

@dcaro do we have a way to automatically handle requests like this?

In T357209#9536704, @Raymond_Ndibe wrote:

@dcaro do we have a way to automatically handle requests like this?

Not yet no, feel free to try to create a cookbook :), though it's managed through commits to the maintain-kubeusers repo:
https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Quota_management

LucasWerkmeister mentioned this in T320140: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes.Feb 13 2024, 1:23 PM

dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/197

maintain-kubeusers: increase quota for wd-shex-infer

Done:

root@tools-k8s-control-6:~# kubectl -n tool-wd-shex-infer get resourcequotas tool-wd-shex-infer -o json | jq '.spec.hard."limits.memory"'
"10Gi"

requests.memory is now set to 5 Gi, rather than 8 Gi as I requested. Is this intentional?

The limit also isn’t working properly yet; from kubectl get events:

15s         Warning   FailedCreate        job/wd-shex-infer-101               Error creating: pods "wd-shex-infer-101-b25wv" is forbidden: maximum memory usage per Container is 6Gi, but limit is 8G

(I think I’ll just kubectl edit this job to unstuck it and test that the rest of T320140 works, but it would be nice to have this working in general.)

Meh, doesn’t work, Kubernetes complained that the memory limit is immutable (if I understood the error message correctly).

I guess I also need the limitrange increased? At least I can see a 6Gi max there.

tools.wd-shex-infer@tools-sgebastion-10:~/www/python/src$ kubectl describe limitrange
Name:       tool-wd-shex-infer
Namespace:  tool-wd-shex-infer
Type        Resource  Min    Max  Default Request  Default Limit  Max Limit/Request Ratio
----        --------  ---    ---  ---------------  -------------  -----------------------
Container   cpu       50m    3    250m             500m           -
Container   memory    100Mi  6Gi  256Mi            512Mi          -

(I’m leaving the job alive for now, by the way, and hope that it can successfully run once the limitrange has been increased.)

In T357209#9550918, @dcaro wrote:

Done:

root@tools-k8s-control-6:~# kubectl -n tool-wd-shex-infer get resourcequotas tool-wd-shex-infer -o json | jq '.spec.hard."limits.memory"'
"10Gi"

https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/197 was never merged. How did this happen?

dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/197

maintain-kubeusers: increase quota for wd-shex-infer

Just merged it (forgot to), deployed it from the branch as usual,.

Maintenance_bot removed a project: Patch-For-Review.Feb 18 2024, 9:30 PM

dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/198

wd-shex-infer: update also the limitrange

In T357209#9553241, @LucasWerkmeister wrote:

I guess I also need the limitrange increased? At least I can see a 6Gi max there.

tools.wd-shex-infer@tools-sgebastion-10:~/www/python/src$ kubectl describe limitrange
Name:       tool-wd-shex-infer
Namespace:  tool-wd-shex-infer
Type        Resource  Min    Max  Default Request  Default Limit  Max Limit/Request Ratio
----        --------  ---    ---  ---------------  -------------  -----------------------
Container   cpu       50m    3    250m             500m           -
Container   memory    100Mi  6Gi  256Mi            512Mi          -

Yep, sorry about that, we don't usually increase the limit range (though maybe we should :/, feels weird limiting).

For the request.memory value, we currently set it to half the memory, I'll have to change our quota managment scripts to allow setting it to something different.

dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/198

wd-shex-infer: update also the limitrange

Updated the limitrange:

root@tools-k8s-control-6:~# kubectl -n tool-wd-shex-infer get limitrange tool-wd-shex-infer -o json | jq '.spec.limits[].max.memory'
"10Gi"

For the requests, you can work-around the default by passing the --mem/--cpu when creating the jobs.

Maintenance_bot removed a project: Patch-For-Review.Feb 19 2024, 9:30 AM

dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/12

quota: allow overriding the requests.cpu and memory

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/199

maintain-kubeusers: bump to 0.0.121-20240219092902-759465a7

dcaro closed https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/199

maintain-kubeusers: bump to 0.0.121-20240219092902-759465a7

dcaro added a subtask: T357881: [maintain-kubeusers] Allow setting the requests cpu and mem quota.Feb 19 2024, 9:40 AM

Thanks, the updated limitrange seems to be working!

dcaro triaged this task as Medium priority.Feb 21 2024, 10:12 AM

@LucasWerkmeister can this task be resolved, or is there anything missing?

Mentioned in SAL (#wikimedia-cloud) [2024-03-02T12:06:32Z] <wmbot~lucaswerkmeister@tools-sgebastion-10> update config.yaml: increase job limits.memory from 6G to 8G, should be possible now (T357209)

It’s mostly working for me, but I’d still like to be able to set requests.memory higher than at the moment (which is blocked on T357881 if I understand correctly).

taavi closed subtask T357881: [maintain-kubeusers] Allow setting the requests cpu and mem quota as Declined.Mar 8 2024, 11:58 AM

taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/218

maintain-kubeusers: Bump memory for wd-shex-infer

taavi claimed this task.Mar 8 2024, 12:01 PM

taavi closed https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/12

quota: allow overriding the requests.cpu and memory

taavi moved this task from Inbox to Approved on the Toolforge (Quota-requests) board.Mar 8 2024, 12:01 PM

taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/218

maintain-kubeusers: Bump memory for wd-shex-infer

starting a run
Update quota for tool wd-shex-infer from version '2-T357209-2' to version '2-T357209-3'
finished run, wrote 0 new accounts, disabled 0 accounts, cleaned up 0 accounts, renewed 0 accounts, updated 1 quotas

This is live now. Sorry for the delay.

Maintenance_bot removed a project: Patch-For-Review.Mar 8 2024, 12:30 PM

Mentioned in SAL (#wikimedia-cloud) [2024-03-08T15:15:54Z] <wmbot~lucaswerkmeister@tools-sgebastion-10> bump requests.memory to 8G (T357209 / T320140)

dcaro changed the status of subtask T357881: [maintain-kubeusers] Allow setting the requests cpu and mem quota from Declined to Resolved.Mar 11 2024, 10:28 AM

taavi changed the status of subtask T357881: [maintain-kubeusers] Allow setting the requests cpu and mem quota from Resolved to Declined.Mar 11 2024, 10:30 AM

Request increased memory quota for wd-shex-infer Toolforge toolClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Request increased memory quota for wd-shex-infer Toolforge tool
Closed, ResolvedPublic
Actions

Related Objects
Search...