The validating admission webhooks running in the new version of Toolforge Kubernetes use certs generated using the certificates API in Kubernetes. That implies an expiration of one year. They need to be renewed via an automated or manually reviewed process that can be reliably used without a deep dive into that API every time.
Description
Event Timeline
@aborrero set up a script for requesting certs at modules/toolforge/files/k8s/admin_scripts/wmcs-k8s-get-cert.sh that places certs in a tmpdir. The main difference here is that we need them in a Kubernetes secret (which can easily be generated from the files like in their individual scripts) and to restart the services.
The services also could use some prometheus instrumentation to tell our monitors about when their certs expire.
And now that script creates secrets with that as well for T215553: Figure out cert management for Toolforge kubernetes and make it clear in documents, etc. for the upgrade and https://phabricator.wikimedia.org/T215553#5674833
For the time being this is basically just following the README directions to create certs...again. The script mentioned in https://phabricator.wikimedia.org/T215553#5674833 won't work for the controllers because they need a cert that validates their DNS.
This is done! It's possible to simply rerun the get_cert script and kill the pod. Documented in the README!