Still migrating stuff from the grid (as part of T319912). Now I'm getting a lot of "Unable to start, out of quota for memory, memory ". 10 in the list of jobs yesterday. Please increase the quota so the jobs I migrate can actually run
Description
Related Objects
Event Timeline
Mentioned in SAL (#wikimedia-cloud) [2024-11-26T18:26:58Z] <wmbot~multichill@tools-bastion-12> Tired of the Unable to start, out of quota for memory, memory, created T380902 for more memory
Hi @Multichill ! Please edit the task to include all the info needed, see https://phabricator.wikimedia.org/project/manage/4834/, so that your request can be processed. Thanks!
The log didn't go back very far. I tried starting some jobs and ran into limits right away:
$ kubectl get events | grep FailedCreate | head -10
2m6s Warning FailedCreate job/remove-duplicate-claims-use-dumps-1733567876 Error creating: pods "remove-duplicate-claims-use-dumps-1733567876-s7f9l" is forbidden: exceeded quota: tool-multichill, requested: limits.memory=2Gi,requests.memory=1073741824, used: limits.memory=7680Mi,requests.memory=3840Mi, limited: limits.memory=8Gi,requests.memory=4Gi
2m5s Warning FailedCreate job/remove-duplicate-claims-use-dumps-1733567876 Error creating: pods "remove-duplicate-claims-use-dumps-1733567876-64fm5" is forbidden: exceeded quota: tool-multichill, requested: limits.memory=2Gi,requests.memory=1073741824, used: limits.memory=7680Mi,requests.memory=3840Mi, limited: limits.memory=8Gi,requests.memory=4Gi
2m3s Warning FailedCreate job/remove-duplicate-claims-use-dumps-1733567876 Error creating: pods "remove-duplicate-claims-use-dumps-1733567876-rff6n" is forbidden: exceeded quota: tool-multichill, requested: limits.memory=2Gi,requests.memory=1073741824, used: limits.memory=7680Mi,requests.memory=3840Mi, limited: limits.memory=8Gi,requests.memory=4Gi
119s Warning FailedCreate job/remove-duplicate-claims-use-dumps-1733567876 Error creating: pods "remove-duplicate-claims-use-dumps-1733567876-gkplc" is forbidden: exceeded quota: tool-multichill, requested: limits.memory=2Gi,requests.memory=1073741824, used: limits.memory=7680Mi,requests.memory=3840Mi, limited: limits.memory=8Gi,requests.memory=4Gi
110s Warning FailedCreate job/remove-duplicate-claims-use-dumps-1733567876 Error creating: pods "remove-duplicate-claims-use-dumps-1733567876-nl4fh" is forbidden: exceeded quota: tool-multichill, requested: limits.memory=2Gi,requests.memory=1073741824, used: limits.memory=7680Mi,requests.memory=4026531840, limited: limits.memory=8Gi,requests.memory=4Gi
94s Warning FailedCreate job/remove-duplicate-claims-use-dumps-1733567876 Error creating: pods "remove-duplicate-claims-use-dumps-1733567876-xwhxw" is forbidden: exceeded quota: tool-multichill, requested: limits.memory=2Gi,requests.memory=1073741824, used: limits.memory=7680Mi,requests.memory=4026531840, limited: limits.memory=8Gi,requests.memory=4Gi
62s Warning FailedCreate job/remove-duplicate-claims-use-dumps-1733567876 Error creating: pods "remove-duplicate-claims-use-dumps-1733567876-d6g69" is forbidden: exceeded quota: tool-multichill, requested: limits.memory=2Gi,requests.memory=1073741824, used: limits.memory=7680Mi,requests.memory=3840Mi, limited: limits.memory=8Gi,requests.memory=4Gi
1s Warning FailedCreate job/remove-duplicate-claims-use-dumps-1733567876 Error creating: pods "remove-duplicate-claims-use-dumps-1733567876-cbvqs" is forbidden: exceeded quota: tool-multichill, requested: limits.memory=2Gi,requests.memory=1073741824, used: limits.memory=7680Mi,requests.memory=3840Mi, limited: limits.memory=8Gi,requests.memory=4Gi
I generally assign 1G per job and some jobs 2Gi because otherwise these will go OOM.
I currently have about 35 jobs. Still have to migrate about the same amount from what used to be the grid. Please double the memory limit from 8 to 16 Gi.
Memory limit updated to 16Gi
rook@tools-bastion-13:~$ kubectl sudo edit quota -n tool-multichill
resourcequota/tool-multichill edited
rook@tools-bastion-13:~$ kubectl sudo get -o yaml quota -n tool-multichill
apiVersion: v1
items:
- apiVersion: v1
kind: ResourceQuota
metadata:
creationTimestamp: "2019-12-17T02:01:30Z"
name: tool-multichill
namespace: tool-multichill
resourceVersion: "2589149775"
uid: f6a49866-7bb5-4203-b734-f2039ceb2fb4
spec:
hard:
configmaps: "10"
count/cronjobs.batch: "50"
count/deployments.apps: "16"
count/jobs.batch: "15"
limits.cpu: "8"
limits.memory: 16Gi