test / test2 incredibly slow with 1.27.0-wmf.12
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	hashar
	Feb 3 2016, 10:51 PM

Description

I switched test and test2 to 1.27.0-wmf.12 and they both are terribly slow. Backend time for a non existing page takes several seconds.

@bd808 mentioned it does not affect mw1017

Details

	Subject	Repo	Branch	Lines +/-
	Only keep testwiki test2wiki 1.20.7-wmf.12	operations/mediawiki-config	master	+4 -4

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		hashar	T125475 MW-1.27.0-wmf.12 deployment blockers
		Resolved		hashar	T125727 test / test2 incredibly slow with 1.27.0-wmf.12

Event Timeline

hashar created this task.Feb 3 2016, 10:51 PM

hashar claimed this task.

hashar raised the priority of this task from to Unbreak Now!.

hashar updated the task description. (Show Details)

hashar added projects: MW-1.27-release (WMF-deploy-2016-02-02_(1.27.0-wmf.12)), Release-Engineering-Team.

hashar added subscribers: Luke081515, aude, Florian and 9 others.

Paladox set Security to None.Feb 3 2016, 10:51 PM

Paladox subscribed.

Then I use the X-Wikimedia-Debug: 1 header to route my requests to mw1017 performance seems normal. Without the header I'm seeing load times of up to 30 seconds for https://test.wikipedia.org/wiki/Main_Page.

Just to confirm: happens for me, too, I jsut wanted to test something (;)) and I was wondering why all actions (rendering of existing page and/or parsing of an edit, both on a mobile phone) takes so long :/

Change 268315 had a related patch set uploaded (by Hashar):
Only keep testwiki test2wiki 1.20.7-wmf.12

https://gerrit.wikimedia.org/r/268315

gerritbot added a project: Patch-For-Review.Feb 3 2016, 10:59 PM

Change 268315 merged by jenkins-bot:
Only keep testwiki test2wiki 1.20.7-wmf.12

https://gerrit.wikimedia.org/r/268315

hashar mentioned this in T125475: MW-1.27.0-wmf.12 deployment blockers.Feb 3 2016, 11:11 PM

Page generation times seem to be getting better and better for testwiki. @ori had a reasonable explanation that this is due to the 11+ hits per server needed to fully warm up the HHVM JIT system. 1.27.0-wmf.12 is the first new branch to be deployed where testwiki is being served by the general MediaWiki web server pool rather than being exclusively handled by wm1017.eqiad.wmnet. With the small load going to testwiki being spread over a much larger pool of servers it will take longer for the full pool to have warm caches.

note we want to update our deployment train doc to have the whole group0 switched instead of just testwiki. https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys

That would warm up the cache faster still with a limited scope (only group0).

Changing priority, no more an issue. I basically freaked out :-}

This was from expect HHVM cache warming. Full IRC log below for those who weren't there:

23:18 <    hashar> so hhvm warming up its cache ?
23:18 <       ori> JIT threshold + few users + many app servers = high likehood of slow request
23:18 <       ori> yes
23:19 <    hashar> so that would only happen when we roll a new branch?
23:19 <       ori> there's what, a half dozen of us making requests to test and test2, plus a handful of random users
23:19 <       ori> the requests that we're making are distributed over 200+ app servers
23:19 <       ori> so there's a high chance that your request is hitting an app server that is translating the code in the wmf12 
                   branch for the first time
23:19 <     bd808> it does seem to be getting better and better
23:20 <    hashar> is a single hit enough to fully populate the jit / cache whatever?
23:20 <       ori> no, it's something like 11 IIRC
23:20 <     bd808> first hit would prime the apc cache equivalent. it takes 11+ to fully warm up the JIT
23:20 <    hashar> so by only deploying to testwiki  ,  that hasn't attracted enough traffic
23:20 <    hashar> and I freaked out
23:21 <       ori> Yes, I think so. I freaked out too, but I think this is what happened.
23:21 <    hashar> whereas had I deployed to mediawikiwiki  that would have populated much faster
23:21 <       ori> yep.
23:21 <     bd808> this is our first full scap wiht testwiki distributed to the cluster too correct?
23:21 <       ori> first full branch maybe; I think there have been other scaps
...
23:23 <       ori> at any rate, I think it is fine to roll out wmf12 to all group0 wikis

1.27.0-wmf.12 is now out on all group0 wikis and seems to be moving at a normal clip. Tentatively closing this task.

test / test2 incredibly slow with 1.27.0-wmf.12Closed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

test / test2 incredibly slow with 1.27.0-wmf.12
Closed, ResolvedPublic
Actions

Related Objects
Search...