We want to eliminate the manual failover with the Toolforge dynamicproxy servers (SSL termination + grid engine routing). This is currently the only point on Toolforge k8s web requests that has a single point of failure, haproxy+ingress+worked pods will all automatically fail over on node failure. I essentially see two options here:
- Completely remove that layer, and terminate TLS at either on haproxy or on the ingress, like on PAWS haproxy
- Do automatic failover for Nginx (easy) and its Redis backend (complex for a "temporary" solution)
The first one is ideal in the long-term, but would require a workaround for grid engine tools (T282975) until the grid is deprecated and removed (a long time).