Maniphest T352055

Toolforge Kubernetes quota requests.memory was reduced
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Bamyers99
	Nov 27 2023, 4:34 PM

Description

T333979: Re-visit Toolforge Kubernetes default quotas (April 2023) reduced the requests.memory quota from 6Gi to 4Gi
Old code: maintain_kubeusers/k8s_api.py 6Gi
New code: maintain_kubeusers/quota.py 8Gi/2 = 4Gi

The checkwiki projects fine tuned job scheduling is now getting a memory quota violation.

Related Objects

Mentioned In: T352251: Quota / webservice resource change?
Mentioned Here: T333979: Re-visit Toolforge Kubernetes default quotas (April 2023)

Event Timeline

Bamyers99 created this task.Nov 27 2023, 4:34 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 27 2023, 4:34 PM

Bamyers99 updated the task description. (Show Details)Nov 27 2023, 5:08 PM

Good catch, I did indeed miss this when working on the quota increases. As far as I can tell, there are three options on how to deal with this:

Increase the default RAM quota to at least 12Gi, so that requests gets bumped up to 6Gi
Hardcode an exception in the requests generation code to ensure that requests.memory is at least 6Gi, even though limits.memory is less than 12Gi
Manually bump quotas for affected tools.

Given that this affects relatively few tools that use a documented-to-be-unstable interface, I'm tempted go with #3.

How much quota does the checkwiki tool use? Full 6Gi of requests I assume?

The reason it uses the "unstable" interface is because it dynamically dispatches jobs from a pod. It used to use the Jobs framework api, but access to this from a pod was blocked sometime in February 2023.

No changes needed for checkwiki. I will reduce it from 2 simultaneous jobs of 2Gi apiece to 1 job of 2Gi and increase the cpu.

taavi mentioned this in T352251: Quota / webservice resource change?.Nov 29 2023, 11:36 AM

taavi merged a task: T352251: Quota / webservice resource change?.

taavi added a subscriber: DamianZaremba.

In T352055#9363408, @taavi wrote:

Manually bump quotas for affected tools.

This was the winner in the WMCS team meeting yesterday. My understanding is that both checkwiki and cluebotng have tuned down their memory usage already. If not, and you need more quota, please file a task in Toolforge (Quota-requests). Thanks and sorry.

Toolforge Kubernetes quota requests.memory was reducedClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Toolforge Kubernetes quota requests.memory was reduced
Closed, ResolvedPublic
Actions