Page MenuHomePhabricator

Eliminate single point of failure from Toolforge front proxy
Open, MediumPublic

Description

We want to eliminate the manual failover with the Toolforge dynamicproxy servers (SSL termination + grid engine routing). This is currently the only point on Toolforge k8s web requests that has a single point of failure, haproxy+ingress+worked pods will all automatically fail over on node failure. I essentially see two options here:

  • Completely remove that layer, and terminate TLS at either on haproxy or on the ingress, like on PAWS haproxy
  • Do automatic failover for Nginx (easy) and its Redis backend (complex for a "temporary" solution)

The first one is ideal in the long-term, but would require a workaround for grid engine tools (T282975) until the grid is deprecated and removed (a long time).