Page MenuHomePhabricator

Refresh certs that are not controlled by kubeadm
Closed, ResolvedPublic

Description

When upgrading the cluster, it's a good idea to refresh certs that aren't controlled by either kubeadm (which get updated during the upgrade process) or maintain-kubeusers (user certs mostly). That means for the Prometheus server and the webhook controllers.

For the admission controller webhooks, rerun the get-cert.sh script similar to the doc, but do not bother the ca-bundle.sh script as that is no longer necessary at all except for local testing. That should inject the secret. To use the secret, delete the appropriate pods in the ingress-admission and registry-admission workspaces to restart them one at a time. Generally the README on the repos for these are the most authoritative docs if in doubt.

For the Prometheus ones, follow the doc on wikitech to recreate the certs.

Certs expire in a year, so they are probably getting old at this point.

Event Timeline

taavi triaged this task as Medium priority.Apr 17 2021, 9:03 AM

The last time this was done (T250874) was about 11 months ago, so we have another month remaining before the certificates expire.

taavi raised the priority of this task from Medium to High.May 25 2021, 2:49 PM

Raising this to high given how little lifetime these certificates have remaining

Mentioned in SAL (#wikimedia-cloud) [2021-06-03T16:49:16Z] <majavah> renew registry-admission-webhook certificates T280301

Mentioned in SAL (#wikimedia-cloud) [2021-06-03T16:55:00Z] <majavah> renew ingress-admission-controller certificates T280301

Mentioned in SAL (#wikimedia-cloud) [2021-06-03T17:06:41Z] <majavah> renew admission webhook certificates T280301

Change 698009 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] toolforge: prometheus: renew k8s TLS cert

https://gerrit.wikimedia.org/r/698009

Mentioned in SAL (#wikimedia-cloud) [2021-06-03T18:26:58Z] <majavah> renew prometheus kubernetes certificate T280301

Change 698009 merged by Bstorm:

[operations/puppet@production] toolforge: prometheus: renew k8s TLS cert

https://gerrit.wikimedia.org/r/698009

taavi claimed this task.

This was done last week (after they expired).

Did that include the webhook controllers? I wasn't clear on that. I thought it was just the prometheus one. The webhooks need a script re-run from their repos to refresh the cert.

Oh yeah, it's in SAL :) Thanks for closing this.