Page MenuHomePhabricator

Quota / webservice resource change?
Closed, DuplicatePublicBUG REPORT

Description

At 26/11/2023, 18:30, ClueBot NG started to alert for no recent edits (i.e. not running).

Upon checking the status on toolsforge, the pods are Terminating with the event log full of

32m         Warning   FailedCreate        replicaset/cbng-7c4bf56bd    Error creating: pods "cbng-7c4bf56bd-kbk4h" is forbidden: exceeded quota: tool-cluebotng, requested: limits.memory=5220Mi,requests.memory=4964Mi, used: limits.memory=5732Mi,requests.memory=5220Mi, limited: limits.memory=8Gi,requests.memory=4Gi
32m         Warning   FailedCreate        replicaset/cbng-7c4bf56bd    Error creating: pods "cbng-7c4bf56bd-88wml" is forbidden: exceeded quota: tool-cluebotng, requested: limits.memory=5220Mi,requests.memory=4964Mi, used: limits.memory=5732Mi,requests.memory=5220Mi, limited: limits.memory=8Gi,requests.memory=4Gi
32m         Warning   FailedCreate        replicaset/cbng-7c4bf56bd    Error creating: pods "cbng-7c4bf56bd-rkhbs" is forbidden: exceeded quota: tool-cluebotng, requested: limits.memory=5220Mi,requests.memory=4964Mi, used: limits.memory=5732Mi,requests.memory=5220Mi, limited: limits.memory=8Gi,requests.memory=4Gi
32m         Warning   FailedCreate        replicaset/cbng-7c4bf56bd    Error creating: pods "cbng-7c4bf56bd-t6tsr" is forbidden: exceeded quota: tool-cluebotng, requested: limits.memory=5220Mi,requests.memory=4964Mi, used: limits.memory=5732Mi,requests.memory=5220Mi, limited: limits.memory=8Gi,requests.memory=4Gi
32m         Warning   FailedCreate        replicaset/cbng-7c4bf56bd    Error creating: pods "cbng-7c4bf56bd-d5k6q" is forbidden: exceeded quota: tool-cluebotng, requested: limits.memory=5220Mi,requests.memory=4964Mi, used: limits.memory=5732Mi,requests.memory=5220Mi, limited: limits.memory=8Gi,requests.memory=4Gi

The quota for bot and associated containers has not changed (this is explicitly set in code), which leaves either the tool quota being reduced, or the webservice job starting to request more quota.

There is nothing explicit in SAL around this time.

Could you advise if this is an expected change?

For the moment I will reduce the requested quota for the bot, which should allow it to run again (it was increased due to T343952, though I suspect that is due to MySQL limits rather than resource limits).

Event Timeline

This is a result of a recent change to the default quotas (T333979) that accidentally lowered requests.memory from 6Gi to 4Gi (T352055). The idea behind these changes was to eliminate the two different requests and limits quotas for toolforge-jobs users, but I did not realize it also meant lowering the memory requests quota. T352055#9363408 has the possible fixes, and I've added this as a topic to our team meeting later today to decide which way to go.