We have currently two ip pools assigned for each DC/cluster:
- a /24 subnet (254 IPs) for K8s svc ips
- a /23 subnet (510 IPs) for K8s pod ips
When we allocated the ranges it was not entirely known how Knative and Istio worked, so we used the standard k8s configuration. We have recently discovered that Knative revisions, created upon each change of the InferenceService resources (basically a deployment for the ml-team), hold a svc IP address until they are cleaned up. We applied a change to limit the amount of non active revisions to keep to three (to allow the use of Knative features like incremental rollout, canary and A/B testing, etc..) but just to support ORES models we'll have to allocate ~ 100 pods, that may translate into 300 svc IP allocations very easily.
This task should evaluate new IP ranges (if possible) for the ML use case, and apply then new subnets to Calico's IPPool's settings (even in an invasive way, we are not live yet).
Maybe the Pod IPs could keep their /23, but the svc pool would probably be good in a /22 (to have extra room for experiments etc..). Any thought?
Last but not the least, we should come up with the procedure to change the Calico's IPPools.