Istio comes with two webhooks by default:
* A mutating webhook `istio-sidecar-injector` that we can potentially ignore as we don't use injection
* A validating webhook `istiod-istio-system` that is used to validate Istio CRD objects
The latter one we can not ignore as installing istio already triggers a validation request and the kube-apiserver will fail to call the service backing the webhook. This is the scenario described in T285927.
After chatting with @akosiaris we came up with ~~three~~four possible solutions:
====1. Announce Kubernetes Service IPs via BGP (calico) so they are reachable from outside the cluster====
This is what we currently have in staging-codfw as part of the work done in T238909.
**Pro:**
* No additional components needed on kubernetes masters
* Shares the traffic flow with how service traffic would reach the cluster (we're not sure about this yet)
**Con:**
* Depends on calico properly announcing Kubernetes Service IPs (which we have not fully implemented yet)
* Need --masquerade-all on nodes (which effectively hides the real client IP from PODs, https://github.com/kubernetes/kubernetes/issues/24224)
* Needs calico announced service ips being highly available. This is being worked upstream.
====2. Make Kubernetes Masters (tainted) worker nodes====
This is what @elukey has implemented for ml clusters in T285927.
**Pro:**
* Shares the traffic flow with other intra-cluster traffic to ClusterIPs
* Kubernetes masters are known to the Kubernetes API (e.g. we can control access via NetworkPolicies and run dedicated workloads on them - istio control plane for example)
* No dependency on calico announcing Kubernetes Service IPs (like with **1.**)
**Con:**
* Lots of new dependencies on masters: kube-proxy, kubelet and docker. Making them way more complex and resource hungry.
* Makes it easier for workload on the nodes to reach the master on various ports (e.g. as a result of a bug in iptables rules manipulation). This is not theoretical, a CVE already exists: https://discuss.kubernetes.io/t/security-advisory-cve-2020-8558-kubernetes-node-setting-allows-for-neighboring-hosts-to-bypass-localhost-boundary/11788
* Will make Kubernetes Masters BGP peer with core routers. Can/should we prevent that?
====3. Run kube-proxy on Kubernetes Masters====
Just run the kube-proxy process on Kubernetes Masters, essentially providing them with the needed iptables rules to reach ClusterIP services.
**Pro:**
* Shares the traffic flow with other intra-cluster traffic to ClusterIPs
* Less additional components than **2.**
* No dependency on calico announcing Kubernetes Service IPs (like with **1.**)
**Con:**
* Additional process (kube-proxy) is needed on the masters (making them more complex plus requiring some puppet work)
* Makes it easier for workload on the nodes to reach the master on various ports (e.g. as a result of a bug in iptables rules manipulation). This is not theoretical, a CVE already exists: https://discuss.kubernetes.io/t/security-advisory-cve-2020-8558-kubernetes-node-setting-allows-for-neighboring-hosts-to-bypass-localhost-boundary/11788
* Makes it a bit more complex to reason about masters being part of the cluster.
* A tested (in WMCS) but not really supported from upstream scenario.
====4. Work around this issue by disabling webhooks====
As we potentially won't use Istio CRDs to configure Ingress in first place (see the **Configuration** part of T287007), we could try to work around this requirement by disabling/not deploying the webhooks at all. I'm not sure if that is possible, though.
**Pro:**
* No additional components on the masters
* No dependency on calico announcing Kubernetes Service IPs (like with **1.**)
* No dependency to istiod (serving the webhooks) from kube-apiserver
**Con:**
* Hard deviation from the Istio setup standard
* We might have to revisit this problem/decision later (for things like OPA or other alternatives to PSPs: T273507)