While working on istio on the ml-serve-eqiad cluster, I noticed the following error in the kube-api logs:
```
Jul 01 09:33:07 ml-serve-ctrl1002 kube-apiserver[421]: E0701 09:33:07.002314 421 dispatcher.go:129] failed calling webhook "validation.istio.io": Post https://istiod.istio-system.svc:443/validate?timeout=30s: dial tcp 10.64.77.73:443: i/o timeout
```
The IP is part of a svc that istiod creates to expose a validation webhook service:
```
elukey@ml-serve-ctrl1001:~$ kubectl describe svc -n istio-system
Name: istiod
Namespace: istio-system
[..]
Selector: app=istiod,istio=pilot
Type: ClusterIP
IP: 10.64.77.73 <=====================================
Port: grpc-xds 15010/TCP
TargetPort: 15010/TCP
Endpoints: 10.64.79.74:15010
Port: https-dns 15012/TCP
TargetPort: 15012/TCP
Endpoints: 10.64.79.74:15012
Port: https-webhook 443/TCP <==========================
TargetPort: 15017/TCP
Endpoints: 10.64.79.74:15017 <============================
Port: http-monitoring 15014/TCP
TargetPort: 15014/TCP
Endpoints: 10.64.79.74:15014
Session Affinity: None
Events: <none>
```
elukey@ml-serve-ctrl1001:~$ kubectl describe ep -n istio-system
Name: istiod
Namespace: istio-system
[..]
Subsets:
Addresses: 10.64.79.74
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
http-monitoring 15014 TCP
https-webhook 15017 TCP <========================
grpc-xds 15010 TCP
https-dns 15012 TCP
```
The kube-api needs to be able to call 10.64.77.73, but the IP is available only on k8s worker nodes where calico runs. The Kubeflow stack is full of webhooks, so it would be great if we could find a way to add calico to the master nodes to enable the extra routing needed.
Some caveats:
* The bird daemon runs on a container in our current worker setup, so it is not sufficient to add `profile::calico::kubernetes` and add BGP peering to cr* routers.
* We could run bird in a docker container launched by systemd on master nodes, but it would be yet another way of starting it (and another thing to maintain).
* We could run bird similar to how we run it on worker nodes, but we'd also need to deploy `kubelet` on master nodes (currently not running on them).
* Some tweaks in `deployment-charts` may be needed to deploy calico on master nodes as well.