Page MenuHomePhabricator

Beta Cluster api.php, index.php, load.php return 404 (caused failed browser tests)
Closed, DeclinedPublic

Description

All the Echo and Flow tests failed starting 5 hours ago. The ones I've looked at failed in a few seconds. The build console log shows an error in mw-api-siteinfo.py parsing some json response, shown below.

I think the real problem is http://en.wikipedia.beta.wmflabs.org/w/api.php is returning a 404. (If so, we ought to fix the script to report this instead of going on to report json decode errors!)

https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/36/console

03:37:02 + GEM_HOME=/mnt/jenkins-workspace/workspace/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/../gems
03:37:02 ++ /srv/deployment/integration/slave-scripts/bin/mw-api-siteinfo.py http://en.wikipedia.beta.wmflabs.org/w/api.php git_branch
03:37:02 Traceback (most recent call last):
03:37:02 File "/srv/deployment/integration/slave-scripts/bin/mw-api-siteinfo.py", line 90, in <module>
03:37:02 main()
03:37:02 File "/srv/deployment/integration/slave-scripts/bin/mw-api-siteinfo.py", line 78, in main
03:37:02 siteinfo = json.loads(response.content)
03:37:02 File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
03:37:02 return _default_decoder.decode(s)
03:37:02 File "/usr/lib/python2.7/json/decoder.py", line 369, in decode
03:37:02 raise ValueError(errmsg("Extra data", s, end, len(s)))
03:37:02 ValueError: Extra data: line 1 column 4 - line 1 column 18 (char 4 - 18)
03:37:02 + MEDIAWIKI_GIT_BRANCH=
03:37:02 Build step 'Execute shell' marked build as failure
03:37:02 Recording test results
03:37:02 IRC notifier plugin: Sending notification to: #wikimedia-qa

The CirrusSearch browser test, 3 minutes later, passed. But I just chose "Build now" and the test ran quickly.

BTW, "Sending notification to: #wikimedia-qa" didn't result in any output in the IRC channel.


Version: unspecified
Severity: major

Details

Reference
bz70648

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 3:48 AM
bzimport set Reference to bz70648.
bzimport added a subscriber: Unknown Object (MLST).

Everything on http://en.wikipedia.beta.wmflabs.org/ is a 404, index.php, load.php as well.

jeremyb noticed /srv/mediawiki on the deployment machines is pretty empty.

gerrit 159431 "beta: switch to /srv/mediawiki" was merged today, says "I made /srv/mediawiki be a symlink to /srv/common-local." The latter has all the expected files in it, and the former *isn't* a symlink. So maybe a puppet change didn't make it out, possibly related to bug 70597.

mw-api-siteinfo.py is in the repository integration/jenkins.git and should probably have better error handling with human friendly messages :D

I think beta labs is working now (thanks to the work of jeremyb and others), though I think this warrants a post-mortem incident report. I made bug 70695 for a clearer failure message.