Page MenuHomePhabricator

PoolCounter usage should be per-cluster
Open, MediumPublic

Description

Load testing the new codfw cluster (T117714) has shown that the maximum concurrent queries on this new cluster is different than the existing cluster in eqiad. The maximum connections allowed by the PoolCounter along with which key is used to track the connections should vary by cluster being queried.

Event Timeline

EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)
EBernhardson subscribed.

@aaron I think you've done most of the work with pool counter before, what makes the most sense here? The issue i see is that application servers in each cluster have their own poolcounters, so if requests are being sent between datacenters (due to downtime for maintenance on one cluster or some such) we will have the opportunity to overload the elasticsearch cluster.

As long as we have a completely idle cluster, we may also be repointing some of our most expensive queries (morelike) at the codfw cluster from eqiad application servers to reduce load prior to the replacement of our 4 year old machines (elastic10{01..16})

I realize this isn't strictly an issue until we start serving requests from multiple data centers at the same time, but it seems like something reasonable to handle sooner than later.

My initial idea is the poolcounter config in $wgPoolCounterClientConf could be adjusted so it always knows about pool counters in all clusters, and has a default cluster given in the configuration. The PoolCounterDoWork* objects would then be adjusted to allow overriding which cluster to communicate with.

Seem reasonable?

@aaron I think you've done most of the work with pool counter before, what makes the most sense here? The issue i see is that application servers in each cluster have their own poolcounters, so if requests are being sent between datacenters (due to downtime for maintenance on one cluster or some such) we will have the opportunity to overload the elasticsearch cluster.

As long as we have a completely idle cluster, we may also be repointing some of our most expensive queries (morelike) at the codfw cluster from eqiad application servers to reduce load prior to the replacement of our 4 year old machines (elastic10{01..16})

I realize this isn't strictly an issue until we start serving requests from multiple data centers at the same time, but it seems like something reasonable to handle sooner than later.

My initial idea is the poolcounter config in $wgPoolCounterClientConf could be adjusted so it always knows about pool counters in all clusters, and has a default cluster given in the configuration. The PoolCounterDoWork* objects would then be adjusted to allow overriding which cluster to communicate with.

Seem reasonable?

I think so.

Deskana moved this task from Inbox to Technical on the CirrusSearch board.
Deskana moved this task from Needs triage to Ops on the Discovery-ARCHIVED board.
Deskana subscribed.