PoolCounter usage should be per-cluster
Open, MediumPublic
Actions

Assigned To

None

Authored By

	EBernhardson
	Jan 21 2016, 10:05 PM

Description

Load testing the new codfw cluster (T117714) has shown that the maximum concurrent queries on this new cluster is different than the existing cluster in eqiad. The maximum connections allowed by the PoolCounter along with which key is used to track the connections should vary by cluster being queried.

Related Objects

Mentioned Here: T117714: Load test the codfw elasticsearch cluster to verify it can handle production load in a switchover

Event Timeline

EBernhardson created this task.Jan 21 2016, 10:05 PM

EBernhardson raised the priority of this task from to Needs Triage.

EBernhardson updated the task description. (Show Details)

EBernhardson subscribed.

Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptJan 21 2016, 10:05 PM

EBernhardson added a project: CirrusSearch.Jan 21 2016, 10:05 PM

EBernhardson set Security to None.

Restricted Application added a project: Discovery-ARCHIVED. · View Herald TranscriptJan 21 2016, 10:05 PM

@aaron I think you've done most of the work with pool counter before, what makes the most sense here? The issue i see is that application servers in each cluster have their own poolcounters, so if requests are being sent between datacenters (due to downtime for maintenance on one cluster or some such) we will have the opportunity to overload the elasticsearch cluster.

As long as we have a completely idle cluster, we may also be repointing some of our most expensive queries (morelike) at the codfw cluster from eqiad application servers to reduce load prior to the replacement of our 4 year old machines (elastic10{01..16})

I realize this isn't strictly an issue until we start serving requests from multiple data centers at the same time, but it seems like something reasonable to handle sooner than later.

My initial idea is the poolcounter config in $wgPoolCounterClientConf could be adjusted so it always knows about pool counters in all clusters, and has a default cluster given in the configuration. The PoolCounterDoWork* objects would then be adjusted to allow overriding which cluster to communicate with.

Seem reasonable?

In T124355#1953997, @EBernhardson wrote:

@aaron I think you've done most of the work with pool counter before, what makes the most sense here? The issue i see is that application servers in each cluster have their own poolcounters, so if requests are being sent between datacenters (due to downtime for maintenance on one cluster or some such) we will have the opportunity to overload the elasticsearch cluster.

As long as we have a completely idle cluster, we may also be repointing some of our most expensive queries (morelike) at the codfw cluster from eqiad application servers to reduce load prior to the replacement of our 4 year old machines (elastic10{01..16})

I realize this isn't strictly an issue until we start serving requests from multiple data centers at the same time, but it seems like something reasonable to handle sooner than later.

My initial idea is the poolcounter config in $wgPoolCounterClientConf could be adjusted so it always knows about pool counters in all clusters, and has a default cluster given in the configuration. The PoolCounterDoWork* objects would then be adjusted to allow overriding which cluster to communicate with.

Seem reasonable?

I think so.

• Deskana triaged this task as Medium priority.Jan 25 2016, 11:28 PM

• Deskana moved this task from Inbox to Technical on the CirrusSearch board.

• Deskana moved this task from Needs triage to Ops on the Discovery-ARCHIVED board.

• Deskana subscribed.

Gehel subscribed.Feb 2 2016, 10:44 AM

PoolCounter usage should be per-clusterOpen, MediumPublicActions

Description

Related Objects

Event Timeline

PoolCounter usage should be per-cluster
Open, MediumPublic
Actions