We currently provision quotas for jobs and cronjobs like this:
taavi@tools-sgebastion-10:~ $ kubectl describe quota -n tool-mismatch-finder-staging Name: tool-mismatch-finder-staging Namespace: tool-mismatch-finder-staging Resource Used Hard -------- ---- ---- count/cronjobs 0 50 count/jobs 0 15 [...]
The quota resource names are wrong! According to https://kubernetes.io/docs/concepts/policy/resource-quotas/#object-count-quota, they should be count/jobs.batch and count/cronjobs.batch.
This just caused us some major issues when a misbehaving tool created an absolute ton of job objects all constantly trying to spawn a job but failing (hitting the pod limit).
TODO:
- fix maintain-kubeusers for new tools
- fix existing tools
- double-check toolforge-jobs is setting a sensible concurrencyPolicy by default