We've had an outage last night during which a server (cp1046) failed. It was successfully detected as dead by pybal and depooled from mobile-lb on both IPv4/IPv6, but only for port 443, not 80. It remained pooled in port 80, which resulted in 1/4th of the requests to be essentially dropped, causing the Icinga LVS check to flap.
From a very quick investigation, I saw two issues here that need further investigation:
- port 80 has only "IdleConnection" configured, not ProxyFetch. Presumably due to issues with ProxyFetch failing if the response code is a 3xx, but this is something that should be fixed regardless.
- IdleConnection failed to detect the dead server. That sounds like a larger issue.