Page MenuHomePhabricator

test / test2 incredibly slow with 1.27.0-wmf.12
Closed, ResolvedPublic

Description

I switched test and test2 to 1.27.0-wmf.12 and they both are terribly slow. Backend time for a non existing page takes several seconds.

@bd808 mentioned it does not affect mw1017

Event Timeline

hashar claimed this task.
hashar raised the priority of this task from to Unbreak Now!.
hashar updated the task description. (Show Details)
hashar added subscribers: Luke081515, aude, Florian and 9 others.

Then I use the X-Wikimedia-Debug: 1 header to route my requests to mw1017 performance seems normal. Without the header I'm seeing load times of up to 30 seconds for https://test.wikipedia.org/wiki/Main_Page.

Just to confirm: happens for me, too, I jsut wanted to test something (;)) and I was wondering why all actions (rendering of existing page and/or parsing of an edit, both on a mobile phone) takes so long :/

Change 268315 had a related patch set uploaded (by Hashar):
Only keep testwiki test2wiki 1.20.7-wmf.12

https://gerrit.wikimedia.org/r/268315

Change 268315 merged by jenkins-bot:
Only keep testwiki test2wiki 1.20.7-wmf.12

https://gerrit.wikimedia.org/r/268315

Page generation times seem to be getting better and better for testwiki. @ori had a reasonable explanation that this is due to the 11+ hits per server needed to fully warm up the HHVM JIT system. 1.27.0-wmf.12 is the first new branch to be deployed where testwiki is being served by the general MediaWiki web server pool rather than being exclusively handled by wm1017.eqiad.wmnet. With the small load going to testwiki being spread over a much larger pool of servers it will take longer for the full pool to have warm caches.

hashar lowered the priority of this task from Unbreak Now! to Low.Feb 3 2016, 11:33 PM

note we want to update our deployment train doc to have the whole group0 switched instead of just testwiki. https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys

That would warm up the cache faster still with a limited scope (only group0).

Changing priority, no more an issue. I basically freaked out :-}

This was from expect HHVM cache warming. Full IRC log below for those who weren't there:

23:18 <    hashar> so hhvm warming up its cache ?
23:18 <       ori> JIT threshold + few users + many app servers = high likehood of slow request
23:18 <       ori> yes
23:19 <    hashar> so that would only happen when we roll a new branch?
23:19 <       ori> there's what, a half dozen of us making requests to test and test2, plus a handful of random users
23:19 <       ori> the requests that we're making are distributed over 200+ app servers
23:19 <       ori> so there's a high chance that your request is hitting an app server that is translating the code in the wmf12 
                   branch for the first time
23:19 <     bd808> it does seem to be getting better and better
23:20 <    hashar> is a single hit enough to fully populate the jit / cache whatever?
23:20 <       ori> no, it's something like 11 IIRC
23:20 <     bd808> first hit would prime the apc cache equivalent. it takes 11+ to fully warm up the JIT
23:20 <    hashar> so by only deploying to testwiki  ,  that hasn't attracted enough traffic
23:20 <    hashar> and I freaked out
23:21 <       ori> Yes, I think so. I freaked out too, but I think this is what happened.
23:21 <    hashar> whereas had I deployed to mediawikiwiki  that would have populated much faster
23:21 <       ori> yep.
23:21 <     bd808> this is our first full scap wiht testwiki distributed to the cluster too correct?
23:21 <       ori> first full branch maybe; I think there have been other scaps
...
23:23 <       ori> at any rate, I think it is fine to roll out wmf12 to all group0 wikis
thcipriani changed the task status from Invalid to Resolved.Feb 3 2016, 11:40 PM

1.27.0-wmf.12 is now out on all group0 wikis and seems to be moving at a normal clip. Tentatively closing this task.