|Resolved||• JHedden||T251027 "signatures" tool has failed job pods on Kubernetes cluster|
|Resolved||aborrero||T251917 Design the Jobs service in k8s|
|Resolved||aborrero||T283238 Toolforge: develop jobs-framework-api|
|Resolved||aborrero||T285944 Toolforge: beta phase for the new jobs framework|
|Resolved||BUG REPORT||aborrero||T286108 toolforge-jobs: Clean up old individual job objects|
- Mentioned In
- T285944: Toolforge: beta phase for the new jobs framework
rCTKF687c063badb4: wait: if the job doesn't exists it means it was already pruned by k8s
- Mentioned Here
- rCTJF07346d715d17: jobs: adjust garbage collection
T251027: "signatures" tool has failed job pods on Kubernetes cluster
T285944: Toolforge: beta phase for the new jobs framework
I wasn't uncomfortable with having to delete each job individually after completion. It allowed to review execution results and status.
But I totally understand the desire for them to be auto-cleaned up.
I've been reading the docs pointed by @Majavah. I'm not fan of having to modify kubelet arguments for this (see patch). I prefer to wait until 1.21 for this to be enabled by default without having to modify the kubelet config (which is in turn managed by kubeadm I believe).
I suspect we'll be on k8s 1.21 before we leave the beta phase for this.
How long do you expect the beta phase to last? I've just finished the 1.18 upgrade and based on that each upgrade takes a fair bit of time to perform, plus 1.20 removes a significant feature (Pod presets) and replacing it takes time too.
this can be an even bigger issue with failed jobs as well. See T251027: "signatures" tool has failed job pods on Kubernetes cluster
The garbage collector should protect the control plane from E_TOO_MANY_PODS, but it confuses users. Perhaps we should start recording all usage of non-GA APIs in our cluster on a wiki page so that we have an easier time of looking for deprecations on upgrades. Since the policy decision on betas (https://kubernetes.io/blog/2020/08/21/moving-forward-from-beta/#avoiding-permanent-beta) nothing is guaranteed except GA released APIs in k8s, after all.