Load testing the new codfw cluster (T117714) has shown that the maximum concurrent queries on this new cluster is different than the existing cluster in eqiad. The maximum connections allowed by the PoolCounter along with which key is used to track the connections should vary by cluster being queried.
Description
Related Objects
Event Timeline
@aaron I think you've done most of the work with pool counter before, what makes the most sense here? The issue i see is that application servers in each cluster have their own poolcounters, so if requests are being sent between datacenters (due to downtime for maintenance on one cluster or some such) we will have the opportunity to overload the elasticsearch cluster.
As long as we have a completely idle cluster, we may also be repointing some of our most expensive queries (morelike) at the codfw cluster from eqiad application servers to reduce load prior to the replacement of our 4 year old machines (elastic10{01..16})
I realize this isn't strictly an issue until we start serving requests from multiple data centers at the same time, but it seems like something reasonable to handle sooner than later.
My initial idea is the poolcounter config in $wgPoolCounterClientConf could be adjusted so it always knows about pool counters in all clusters, and has a default cluster given in the configuration. The PoolCounterDoWork* objects would then be adjusted to allow overriding which cluster to communicate with.
Seem reasonable?