Page MenuHomePhabricator

Refresh external certs for the toolforge k8s cluster after the upgrade
Closed, ResolvedPublic

Description

Upgrading will refresh most certs in the cluster. Maintain-kubeusers should refresh its own certs.
That said, externally created certs (prometheus, webhook controllers) need to be manually remade.

Use the appropriate scripts to do this.

Event Timeline

JHedden triaged this task as Medium priority.May 5 2020, 4:14 PM
JHedden raised the priority of this task from Medium to Needs Triage.
JHedden triaged this task as Medium priority.
JHedden moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.

I was about to just use my cert scripts, but they won't do. I need to mess with them a bit to get the admission controller scripts renewed...ideally with a simple argument or something to say "just renew the scripts". A second "create" with the same name will fail. It should update the existing secrets with new scripts with all the appropriate alt-names for doing SSL termination.

Cool thing, I can just re-run the scripts I've got for the controllers. It works great on minikube. I'll run it in toolsbeta and delete the pods to restart as well if it needs it.

Using an operations-pod for maintain-kubeusers (so that I could install the openssl package in the pod):

# echo | openssl s_client -showcerts -servername registry-admission.registry-admission.svc -connect registry-admission.registry-admission.svc:443 2>/dev/null | openssl x509 -noout -dates
notBefore=Jun  1 23:23:00 2020 GMT
notAfter=Jun  1 23:23:00 2021 GMT

That tells me it worked!
I'll do it in tools. Also I'll document the process in the README files of the controllers. I did restart the pods just in case.

Mentioned in SAL (#wikimedia-cloud) [2020-06-01T23:51:57Z] <bstorm_> refreshed certs for the custom webhook controllers on the k8s cluster T250874

That leaves what else? Prometheus? @aborrero
Or did we have something else where we made certs by hand?

Change 601484 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/registry-admission-webhook@master] docs: add instructions for rotating certs

https://gerrit.wikimedia.org/r/601484

Change 601485 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[cloud/toolforge/ingress-admission-controller@master] docs: add instructions for rotating certs

https://gerrit.wikimedia.org/r/601485

This is the prometheus cert:

root@tools-prometheus-03:~# openssl x509 -in /etc/ssl/localcerts/toolforge-k8s-prometheus.crt -text
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            10:59:16:a9:2e:a4:ed:d5:75:49:45:57:2f:c4:58:6b:88:0d:89:7c
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN = kubernetes
        Validity
            Not Before: Nov 19 11:36:00 2019 GMT
            Not After : Nov 18 11:36:00 2020 GMT
        Subject: CN = prometheus
[...]

So the allowed year ends in November. Let me refresh it now just in case.

Change 601692 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: prometheus: renew TLS cert for the k8s API

https://gerrit.wikimedia.org/r/601692

Change 601692 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: prometheus: renew TLS cert for the k8s API

https://gerrit.wikimedia.org/r/601692

Change 601484 merged by Bstorm:
[labs/tools/registry-admission-webhook@master] docs: add instructions for rotating certs

https://gerrit.wikimedia.org/r/601484

Change 601485 merged by Bstorm:
[cloud/toolforge/ingress-admission-controller@master] docs: add instructions for rotating certs

https://gerrit.wikimedia.org/r/601485

Bstorm claimed this task.

I think we are done then!