First request after a MediaWiki sync times out on mwdebug
Closed, DuplicatePublicPRODUCTION ERROR
Actions

Assigned To

None

Authored By

	Tgr
	Feb 6 2019, 12:51 AM

Description

Steps to reproduce:

change something in the MediaWiki code on mwdeploy
run scap pull on one of the mwdebug hosts
use X-Wikimedia-Debug to send a request to that host

Usually you get a white screen of death and a logstash error saying the request timed out after 60 seconds. The next request will still be somewhat slow but work, and requests after that will be fine.

This behavior has started recently (1-2 months ago?).

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Duplicate	PRODUCTION ERROR	None	T215368 First request after a MediaWiki sync times out on mwdebug
		Resolved		akosiaris	T212955 Increase mwdebugXXXX hosts CPU

Event Timeline

Tgr created this task.Feb 6 2019, 12:51 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 6 2019, 12:51 AM

Tgr updated the task description. (Show Details)Feb 6 2019, 12:52 AM

See also: T203664

Tgr mentioned this in T203664: scap timeout checking index.php/api.php mwdebug1001 / mwdebug1002.Feb 6 2019, 12:55 AM

Jdforrester-WMF subscribed.Feb 6 2019, 1:31 AM

Krinkle added a subtask: T212955: Increase mwdebugXXXX hosts CPU.Feb 6 2019, 2:57 AM

There's two timeout issues currently relating to deployments: The ones we see on mwdebug (this task), and the ones we see on all canaries and app servers (T204871).

It's possible that these are the same issue, but, it is also possible that they are distinct. We know that the mwdebug servers in particular are slower in general due to being VMs (T203625). I've added T203625 as sub task, so that we can first see if it gets better after that.

akosiaris closed subtask T212955: Increase mwdebugXXXX hosts CPU as Resolved.Feb 7 2019, 9:48 AM

The hosts mwdebug1001, mwdebug1002, mwdebug2001, mwdebug2002 now have four vCPUs allocated (was T212955). Should makes them faster when recompiling HHVM bytecode cache?

Krinkle moved this task from Untriaged to Older on the Wikimedia-production-error board.Feb 12 2019, 6:00 PM

Given we can now see that the issue only affects HHVM and not PHP 7 on the mwdebug servers, going to close this by merging into T204871.

Short term: Use PHP 7 if you want to be quick on mwdebug and not wait 2 minutes to get passed the HHVM timeouts after a deploy.

Med term: T204871 will probably remained deprioritised and declined once T176370 is completed.

Krinkle closed this task as a duplicate of T204871: Investigate the spikes of "web request took longer than 60 seconds and timed out" during deployments on HHVM.May 19 2019, 10:00 AM

• mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM

First request after a MediaWiki sync times out on mwdebugClosed, DuplicatePublicPRODUCTION ERRORActions

Description

Related ObjectsSearch...

Event Timeline

First request after a MediaWiki sync times out on mwdebug
Closed, DuplicatePublicPRODUCTION ERROR
Actions

Related Objects
Search...