For T348758: [jobs-api,jobs-cli] Support services in jobs we should increment the default services quota from the current 1 to some TBD higher value.
Description
Details
Title | Reference | Author | Source Branch | Dest Branch | |
---|---|---|---|---|---|
maintain-kubeusers: bump to 0.0.131-20240525201329-ca173bf3 | repos/cloud/toolforge/toolforge-deploy!284 | project_1317_bot_df3177307bed93c3f34e421e26c86e38 | bump_maintain-kubeusers | main | |
[maintain-kubeusers] increment default services quota | repos/cloud/toolforge/maintain-kubeusers!25 | raymond-ndibe | increase_services_default_quota | main |
Event Timeline
to some TBD higher value.
How about 16 to match the number of pods in the default quota? We can certainly pick any other arbitrary number >1 as well, but this at least has some rationale.
I find it pretty unlikely that a typical tool will exhaust either the Pod or Service quota at 16; I think the most typical tool will continue to be a webservice that consumes one Pod and one Service. In the spirit of T306324: Consider improving quota workflow I think we should set the default limits quite high relative to expected use when possible so that we don't end up discouraging folks from innovating by making them stop to ask for permission to use the platform.
I thought the plan is to only allow services for continuous jobs (which web-service will probably become entangled with soon)? if that's the case then we should make services quota the same as deployments quota which is currently 3. We can also increase this but atleast both services and deployments quotas should be in lockstep
$ kubectl describe quota Name: tool-wikibugs Namespace: tool-wikibugs Resource Used Hard -------- ---- ---- configmaps 2 10 count/cronjobs.batch 0 50 count/deployments.apps 6 6 count/jobs.batch 0 15 limits.cpu 3 8 limits.memory 3Gi 8Gi persistentvolumeclaims 0 0 pods 6 16 requests.cpu 1375m 4 requests.memory 1536Mi 4Gi secrets 21 64 services 2 2 services.nodeports 0 0
I guess wikibugs has had a quota bump for this already (count/deployments.apps 6) and is currently bumping up against the raised limit too. Why do we care how many Deployments a namespace has as long as the CPU and RAM quotas allow the Pods they manage?
We need to have some quota in place to prevent a misbehaving tool from taking kube-apiserver down by creating hundreds or thousands of unfillable ReplicaSets (a similar thing has happened in the past, see T301081), but I have no objections to raising the deployment quota to match the pod one for example.
+1 on using the same number as pods for deployments/services, it seems like you'd always want to back up a deployment with at least one pod, and a service with at least 1 deployment (though we might want to enable people to use more ExternalName type of services eventually, I don't have any use-case in mind though, so we might want to wait until then to care about them).
raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/25
[maintain-kubeusers] increment default services quota
raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/25
[maintain-kubeusers] increment default services quota
project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/284
maintain-kubeusers: bump to 0.0.131-20240525201329-ca173bf3
raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/284
maintain-kubeusers: bump to 0.0.131-20240525201329-ca173bf3