Some of our scaling and/or DDoS problems really come down to limitations induced by our LVS servers' TCP state-tracking. We don't truly need state-tracking there, it's just an artifact of our present circumstances and the available kernel code. Quite a few distinct chunks of work need to come together to eliminate it:
- A proper chashing lvs scheduler kernel module - T86651
- Something like the ipvs "one packet scheduler" support that exists for UDP, but for TCP (I suspect all we have to do is remove the protocol checks in both ipvsadm and the kernel here.
- Pybal needs to be a bit more flap-resistant, somehow, because we don't want very minor artificial healthcheck failures causing a large number of RSTs, which is what would happen with stateless LVS otherwise. Related to T172124 and others.