Page MenuHomePhabricator

First request after a MediaWiki sync times out on mwdebug
Closed, DuplicatePublicPRODUCTION ERROR

Description

Steps to reproduce:

  • change something in the MediaWiki code on mwdeploy
  • run scap pull on one of the mwdebug hosts
  • use X-Wikimedia-Debug to send a request to that host

Usually you get a white screen of death and a logstash error saying the request timed out after 60 seconds. The next request will still be somewhat slow but work, and requests after that will be fine.

This behavior has started recently (1-2 months ago?).

Event Timeline

There's two timeout issues currently relating to deployments: The ones we see on mwdebug (this task), and the ones we see on all canaries and app servers (T204871).

It's possible that these are the same issue, but, it is also possible that they are distinct. We know that the mwdebug servers in particular are slower in general due to being VMs (T203625). I've added T203625 as sub task, so that we can first see if it gets better after that.

The hosts mwdebug1001, mwdebug1002, mwdebug2001, mwdebug2002 now have four vCPUs allocated (was T212955). Should makes them faster when recompiling HHVM bytecode cache?

Given we can now see that the issue only affects HHVM and not PHP 7 on the mwdebug servers, going to close this by merging into T204871.

Short term: Use PHP 7 if you want to be quick on mwdebug and not wait 2 minutes to get passed the HHVM timeouts after a deploy.

Med term: T204871 will probably remained deprioritised and declined once T176370 is completed.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM