We have learn a few lessons about kyverno and how to set it all up, in particular:
- T367388: [k8s,infra] consider scaling the k8s control plane
- T367386: [k8s,infra] kyverno has a track record of overloading the cluster, maybe on new ways
- T367389: [k8s,infra,alerting] improve HAproxy and k8s apiserver interaction
We are merging https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/337 to redeploy it.
In case of rollback:
Given we cannot easily 'undeploy' via the workflows in toolforge-deploy, a quick rollback of this would be:
- manually delete the api-server webhooks:
kubectl delete validatingwebhookconfiguration kyverno-resource-validating-webhook-cfg
kubectl delete mutatingwebhookconfiguration kyverno-resource-mutating-webhook-cfg
- manually scale down the kyverno replicas:
kubectl scale deploy kyverno-admission-controller -n kyverno --replicas 0