Page MenuHomePhabricator

Figure out certificate generation for admission webhooks before we lose the certificates/v1beta1
Open, Needs TriagePublic

Description

We are currently generating certificates for the admission webhooks in Toolforge using the v1beta1 certificates API because it allows the unknown-legacy signer. That is not valid in the certificates/v1 API, and v1beta1 gets dropped in 1.22.

We should not upgrade to this level until this is resolved. (noticed in T289390) and confirmed as an issue in upstream Github issues. A new signer may be added by the upstream project for https://github.com/kubernetes/kubernetes/issues/63732, but we'd need that to be used first. We just need a trusted TLS cert (that the control plane pod trusts) to run those controllers.

Event Timeline

The overall issue is that existing certificates/v1 signers don't include a pod serving signer. You cannot make it use the kubelet serving signer (which is the closest you can come). This should not be an issue for maintain-kubeusers since there's a signer for that use case. The certs that signer makes cannot be serving certs, though. They can only be used for client auth.

Some docs on the signers is https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/#kubernetes-signers

They provide no API to create a new signer. You have to put in a pull request to do that :). There is an issue where that may very well be done: https://github.com/kubernetes/kubernetes/issues/63732

We don't need auto-approval, but it would be nice!

Alternative solution would be to use certificates signed by something else than Kubernetes. Bonus points if that is combined with full creation + renewal automation. https://cert-manager.io/docs/concepts/ca-injector/ is likely a good alternative.

I took a look into using cert-manager for webhook certs today, and my only concern is that our webhook controllers don't pick up certificate changes from disk. The cert-manager documentation suggests something like https://github.com/wave-k8s/wave, I'm a bit worried about additional complexity but a similar feature could be useful elsewhere too.

Change 730144 had a related patch set uploaded (by Majavah; author: Majavah):

[labs/tools/registry-admission-webhook@master] toolsbeta: Exclude more namespaces

https://gerrit.wikimedia.org/r/730144

Change 730144 merged by jenkins-bot:

[labs/tools/registry-admission-webhook@master] toolsbeta: Exclude more namespaces

https://gerrit.wikimedia.org/r/730144

Mentioned in SAL (#wikimedia-cloud) [2021-10-16T07:47:36Z] <majavah> deployed cert-manager and wave as a test for automating T292238

I deployed cert-manager and wave to toolsbeta using the following commands as a root on a k8s-control node:

helm repo add jetstack https://charts.jetstack.io
helm repo add stakater https://stakater.github.io/stakater-charts
helm repo update


kubectl create -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: reloader
  labels:
    name: reloader
---
apiVersion: v1
kind: Namespace
metadata:
  name: cert-manager
  labels:
    name: cert-manager
EOF

helm install -n reloader reloader stakater/reloader --version v0.0.109
helm install -n cert-manager cert-manager jetstack/cert-manager --version v1.5.4 --set installCRDs=true --set global.podSecurityPolicy.enabled=true

kubectl create -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: reloader-psp
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: privileged-psp
subjects:
- kind: ServiceAccount
  name: reloader-reloader
  namespace: reloader
EOF

I am a bit concerned that the wave chart is using deprecated apis (notably rbac.authorization.k8s.io/v1beta1), I'll definitely want to test wave with k8s 1.22 before considering it a production-grade dependency we want to rely on.

Change 735681 had a related patch set uploaded (by Michael DiPietro; author: Michael DiPietro):

[cloud/toolforge/ingress-admission-controller@master] update to remove k8s side ca

https://gerrit.wikimedia.org/r/735681

The code in https://gerrit.wikimedia.org/r/c/cloud/toolforge/ingress-admission-controller/+/735681 does away with CertificateSigningRequest. Is there a reason that this approach would not be reasonable?

In T292238#7468725, @mdipietro wrote:

The code in https://gerrit.wikimedia.org/r/c/cloud/toolforge/ingress-admission-controller/+/735681 does away with CertificateSigningRequest. Is there a reason that this approach would not be reasonable?

That approach looks good if we want to continue using a roughly similar process as what we've done before. I'm still going to at least see if cert-manager would work, since that'd be one less manual admin action we have to do sometimes.

Change 737171 had a related patch set uploaded (by Majavah; author: Majavah):

[cloud/toolforge/volume-admission-controller@main] Use cert-manager to generate webhook certificates

https://gerrit.wikimedia.org/r/737171

In T292238#7433645, @Majavah wrote:

I am a bit concerned that the wave chart is using deprecated apis (notably rbac.authorization.k8s.io/v1beta1), I'll definitely want to test wave with k8s 1.22 before considering it a production-grade dependency we want to rely on.

https://github.com/stakater/Reloader is actively maintained and does the same thing.

Change 737171 had a related patch set uploaded (by Majavah; author: Majavah):

[cloud/toolforge/volume-admission-controller@main] Use cert-manager to generate webhook certificates

https://gerrit.wikimedia.org/r/737171

I deployed this to toolsbeta and it works fine.