With more automation/self-service coming via Istio-Ingressgateway (T209066) we should also improve how we issue and deploy TLS certificates to services running on out Kubernetes clusters.
The current process is described here and involves quite some manual steps. It also requires an SRE, as root is needed.
With the cfssl based PKI now being available we should aim for a integration with cert-manager as the de-facto standard in Kubernetes world.
ML also took a look at cert-manager (T280661) and decided not to use it for now (but they do issue way less certificates than we do).
There are basically two ways we could integrate cert-manager:
1. CA issuer
This is part of the standard implementation of cert-manager and would require us to create an intermediate (dedicated to Kubernetes clusters) using our PKI and provide that to the Kubernetes clusters. The clusters cert-manager instances could then issue certificated based on that.
The obvious downside of this is that we create some kind of "split brain CA" as each Kubernetes cluster would issue certificates with the same intermediate not knowing about the others. Also we would have to manage the intermediate ourselves (renew etc.).
2. CFSSL (API) issuer
cert-manager supports external Issuers and such one could be used to have certificates issued directly via pki.discovery.wmnet.
This would allow us to rely on the PKI infrastructure and have certificates issued with the "discovery" intermediate that is already managed there. Also this will retain the single source of truth regarding which certificates have been issues/are valid and relieves us from the burden of managing an intermediate.
While this seems more like "the right way" to do it, there is currently only one implementation (I could find) of a cfssl Issuer: https://github.com/OpenSource-THG/cfssl-issuer
After talking to @jbond about this it seems as if we're okay with calling the CFSSL API of our PKI directly with some sort of authentication applied. I took a closer look at the OpenSource-THG/cfssl-issuer then to verify if it could work for us.
Unfortunately the issuer does not support any kind of authentication towards the CFSSL API so I decided to hack that in for an initial test.
The work on that revealed that the issuer is in a not ideal shape as it seems to not follow the standards of cert-manager (anymore?), there is a lot of duplicate code and, while being able to issue the right API calls and receiving the certificate from CFSSL, I wasn't able to make it actually reconcile the Certificate/Secret objects in Kubernetes correctly. I ultimately stopped debugging it to first write this task.
To continue with this, we could:
- Try to update/fix the OpenSource-THG/cfssl-issuer (maybe with help of the initial developers, although the project does not seem very active)
- Start our own implementation from scratch. Might sound weird at first, but cert-manager provides kind of an SDK/scaffold around this which is regularly updated and our use case (calling an external API) is not that complex after all.
2.2 own cfssl-issuer implementation
I went ahead by creating our own cfssl-issuer implementation due to the fact that it's not very hard to do and it seemed harder to clean/fix the existing codebase.
Follow up things to do:
- Build cert-manager docker images (v1.5.4, last one compatible/tested with k8s v1.16)
- Import cert-manager helm chart
- Build cfss-issuer docker images
- Write cfssl-issuer helm chart
- Write admin_ng helmfile to install cert-manager & cfss-issuer
- Come up with a proper idea of how to provision the certificate objects in k8s (as the resulting secrets need to be in istio-system namespace) T295385
- Write some docs https://wikitech.wikimedia.org/wiki/Kubernetes/cert-manager