@Bstorm has some concerns on the reaload times for our current ingress implementation in the new k8s cluster nginx-ingress.
My biggest concern about the
migration is that an ingress for each web service will cause problems in the
controller on reload. At that point, we need to either use dynamic proxy +
calico routing or a different ingress that scales better (supposedly haproxy and
Traefik might). The scaling I worry about is purely the reload time. It might
get to a point where the reload time is very long and leads to some kind of
downtime. It may also be no problem at all!
then:
Also I realized we cannot dynamically autoscale the controllers without
something like ECMP Anycast routing using Calico and labeling (which would
basically turn the services into a legit basic load balancer—if it can be made
dynamic enough, it might not—then just statically scale it out a bit and still
use ECMP Anycast across them). So overall, I think testing calico BGP magic at
the proxy level in toolsbeta will be a good idea. I might jump on that, but
maybe we should try creating a bunch of tools with Bryan’s script there *first*
and see how the ingress responds with more web services running and restarting,
etc. You could jump on that in the next couple days even if you want.
I suggest the first step is to really try and see how bad reload times are for a big number of services and ingresses.
Actionables:
- check numbers for config reload times
- check whether webtraffic is served while on reload
- check pod scalation for nginx-ingress