Istio comes with two webhooks by default:
- A mutating webhook istio-sidecar-injector that we can potentially ignore as we don't use injection
- A validating webhook istiod-istio-system that is used to validate Istio CRD objects
The latter one we can not ignore as installing istio already triggers a validation request and the kube-apiserver will fail to call the service backing the webhook. This is the scenario described in T285927.
After chatting with @akosiaris we came up with threefour possible solutions:
1. Announce Kubernetes Service IPs via BGP (calico) so they are reachable from outside the cluster
This is what we currently have in staging-codfw as part of the work done in T238909.
Pro:
- No additional components needed on kubernetes masters
- Shares the traffic flow with how service traffic would reach the cluster (we're not sure about this yet)
Con:
- Depends on calico properly announcing Kubernetes Service IPs (which we have not fully implemented yet)
- Need --masquerade-all on nodes (which effectively hides the real client IP from PODs, https://github.com/kubernetes/kubernetes/issues/24224)
- Needs calico announced service ips being highly available. This is being worked upstream.
2. Make Kubernetes Masters (tainted) worker nodes
This is what @elukey has implemented for ml clusters in T285927.
Pro:
- Shares the traffic flow with other intra-cluster traffic to ClusterIPs
- Kubernetes masters are known to the Kubernetes API (e.g. we can control access via NetworkPolicies and run dedicated workloads on them - istio control plane for example)
- No dependency on calico announcing Kubernetes Service IPs (like with 1.)
Con:
- Lots of new dependencies on masters: kube-proxy, kubelet and docker. Making them way more complex and resource hungry.
- Makes it easier for workload on the nodes to reach the master on various ports (e.g. as a result of a bug in iptables rules manipulation). This is not theoretical, a CVE already exists: https://discuss.kubernetes.io/t/security-advisory-cve-2020-8558-kubernetes-node-setting-allows-for-neighboring-hosts-to-bypass-localhost-boundary/11788
- Will make Kubernetes Masters BGP peer with core routers. Can/should we prevent that?
3. Run kube-proxy on Kubernetes Masters
Just run the kube-proxy process on Kubernetes Masters, essentially providing them with the needed iptables rules to reach ClusterIP services.
Pro:
- Shares the traffic flow with other intra-cluster traffic to ClusterIPs
- Less additional components than 2.
- No dependency on calico announcing Kubernetes Service IPs (like with 1.)
Con:
- Additional process (kube-proxy) is needed on the masters (making them more complex plus requiring some puppet work)
- Makes it easier for workload on the nodes to reach the master on various ports (e.g. as a result of a bug in iptables rules manipulation). This is not theoretical, a CVE already exists: https://discuss.kubernetes.io/t/security-advisory-cve-2020-8558-kubernetes-node-setting-allows-for-neighboring-hosts-to-bypass-localhost-boundary/11788
- Makes it a bit more complex to reason about masters being part of the cluster.
- A tested (in WMCS) but not really supported from upstream scenario.
4. Work around this issue by disabling webhooks
With the outcome of T287007#7431081 this is no longer a viable option
As we potentially won't use Istio CRDs to configure Ingress in first place (see the Configuration part of T287007), we could try to work around this requirement by disabling/not deploying the webhooks at all. I'm not sure if that is possible, though.
Pro:
- No additional components on the masters
- No dependency on calico announcing Kubernetes Service IPs (like with 1.)
- No dependency to istiod (serving the webhooks) from kube-apiserver
Con:
- Hard deviation from the Istio setup standard
- We might have to revisit this problem/decision later (for things like OPA or other alternatives to PSPs: T273507)
We're going with option 2, todos:
- Migrate staging-eqiad
- Migrate codfw
- Migrate eqiad
- Remove unused master.pp parameter profile::kubernetes::master::expose_puppet_certs