Page MenuHomePhabricator

Depooling single text caching server in esams had a disproportionate performance impact
Open, MediumPublic

Description

@ema depooled and repooled cp3050 on 2019-11-11 and during the interim period we saw a huge regression in end-user performance in Europe:

Capture d'écran 2019-11-12 14.22.33.png (469×1 px, 125 KB)

https://grafana.wikimedia.org/d/000000143/navigation-timing?orgId=1&from=1573451674225&to=1573523748836&var-source=navtiming2&var-metric=responseStart&var-percentile=p50

Capture d'écran 2019-11-12 14.23.51.png (1×1 px, 155 KB)

https://grafana.wikimedia.org/d/000000230/navigation-timing-by-continent?panelId=54&fullscreen&orgId=1&from=1573466170251&to=1573497052164

There are 7 servers in that cluster, turning off a single server shouldn't cause almost 2x response time.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-operations) [2019-11-12T16:09:45Z] <ema> depool cp3052 and observe performance impact T238085 before reimaging as text_ats T227432

ema triaged this task as Medium priority.Nov 12 2019, 4:11 PM
BBlack added a subscriber: BBlack.

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!