This task calls for several things:
* [] Make the connectivity scaling factor converge faster to avoid connection attempts to downed servers faster. Currently, the logs still get flooded with connection errors. We should both using a higher "moving average ratio" and poll the mysql connected/running/max_connections status variables to see how close a server is too being overloaded.
* [] Make the cache keys per DB-server so that extra query groups do not require separate cache keys (even if they are a subset of the main traffic servers or at least overlap). When checking which servers needs state updates, the order should be shuffled. This makes it easier to keep the data more up-to-date since it no longer requires a for-loop that always has to connect to everything.
* [] Simplify the tiered apcu/WANCache logic to use either apcu (web mode) or the local cluster cache (CLI mode). Since both have the BagOStuff interface, this would simplify the code significantly. Placeholders should liberally be used for a few seconds when there is no stale value and the cache mutex is already held. Cluster cache updates from web requests should use WRITE_BACKGROUND to avoid latency.
* [] Mitigate network slowness when LoadMonitor polls/gauges servers (e.g. lower connection timeout and set read timeout with mysqli). Maybe there could be a LoadBalancer::CONN_GAUGE_PROBE constant for a third connection class category to help this. Past outages have involved connections hanging in the ACCEPT state, or queries (including heartbeat table and SHOW ones) being slow.
See T314020 for ideas about tracking connection attempt failures.