With more automation/self-service coming via Istio-Ingressgateway (T209066) we should also improve how we issue and deploy TLS certificates to services running on out Kubernetes clusters.
The current process is described [[ https://wikitech.wikimedia.org/wiki/Kubernetes/Enabling_TLS#Create_and_place_certificates | here ]] and involves quite some manual steps. It also requires an SRE, as root is needed.
With the [[ https://wikitech.wikimedia.org/wiki/PKI | cfssl based PKI ]] not being available we should aim for a integration with [[ https://cert-manager.io/ | cert-manager ]] as the de-facto standard in Kubernetes world.
ML also took a look at cert-manager (T280661) and decided not to use it for now (but they do issue way less certificates than we do).
There are basically two ways we could integrate cert-manager:
= 1. CA issuer =
https://cert-manager.io/docs/configuration/ca/
This is part of the standard implementation of cert-manager and would require us to [[ https://wikitech.wikimedia.org/wiki/PKI/CA_Operations#Adding_a_new_intermediate | create an intermediate ]] (dedicated to Kubernetes clusters) using our PKI and provide that to the Kubernetes clusters. The clusters cert-manager instances could then issue certificated based on that.
The obvious downside of this is that we create some kind of "split brain CA" as each Kubernetes cluster would issue certificates with the same intermediate not knowing about the others. Also we would have to manage the intermediate ourselves (renew etc.).
= 2. CFSSL (API) issuer =
cert-manager supports external Issuers and such one could be used to have certificates issued directly via pki.discovery.wmnet.
This would allow us to rely on the PKI infrastructure and have certificates issued with the "discovery" intermediate that is already managed there. Also this will retain the single source of truth regarding which certificates have been issues/are valid and relieves us from the burden of managing an intermediate.
While this seems more like "the right way" to do it, there is currently only one implementation (I could find) of a cfssl Issuer: https://github.com/OpenSource-THG/cfssl-issuer
== 2.1 OpenSource-THG/cfssl-issuer ==
After talking to @jbond about this it seems as if we're okay with calling the CFSSL API of our PKI directly with some sort of authentication applied. I took a closer look at the OpenSource-THG/cfssl-issuer then to verify if it could work for us.
Unfortunately the issuer does not support any kind of authentication towards the CFSSL API so I decided to hack that in for an initial test.
The work on that revealed that the issuer is in a not ideal shape as it seems to not follow the standards of cert-manager (anymore?), there is a lot of duplicate code and, while being able to issue the right API calls and receiving the certificate from CFSSL, I wasn't able to make it actually reconcile the Certificate/Secret objects in Kubernetes correctly. I ultimately stopped debugging it to first write this task.
To continue with this, we could:
* Try to update/fix the OpenSource-THG/cfssl-issuer (maybe with help of the initial developers, although the project does not seem very active)
* Start our own implementation from scratch. Might sound weird at first, but cert-manager provides kind of an [[ https://cert-manager.io/docs/contributing/external-issuers/ | SDK/scaffold ]] around this which is regularly updated and our use case (calling an external API) is not that complex after all.