Page MenuHomePhabricator

Toolforge Kubernetes quota requests.memory was reduced
Closed, ResolvedPublic

Description

T333979: Re-visit Toolforge Kubernetes default quotas (April 2023) reduced the requests.memory quota from 6Gi to 4Gi
Old code: maintain_kubeusers/k8s_api.py 6Gi
New code: maintain_kubeusers/quota.py 8Gi/2 = 4Gi

The checkwiki projects fine tuned job scheduling is now getting a memory quota violation.

Event Timeline

taavi triaged this task as High priority.
taavi subscribed.

Good catch, I did indeed miss this when working on the quota increases. As far as I can tell, there are three options on how to deal with this:

  1. Increase the default RAM quota to at least 12Gi, so that requests gets bumped up to 6Gi
  2. Hardcode an exception in the requests generation code to ensure that requests.memory is at least 6Gi, even though limits.memory is less than 12Gi
  3. Manually bump quotas for affected tools.

Given that this affects relatively few tools that use a documented-to-be-unstable interface, I'm tempted go with #3.

How much quota does the checkwiki tool use? Full 6Gi of requests I assume?

The reason it uses the "unstable" interface is because it dynamically dispatches jobs from a pod. It used to use the Jobs framework api, but access to this from a pod was blocked sometime in February 2023.

No changes needed for checkwiki. I will reduce it from 2 simultaneous jobs of 2Gi apiece to 1 job of 2Gi and increase the cpu.

  1. Manually bump quotas for affected tools.

This was the winner in the WMCS team meeting yesterday. My understanding is that both checkwiki and cluebotng have tuned down their memory usage already. If not, and you need more quota, please file a task in Toolforge (Quota-requests). Thanks and sorry.