Page MenuHomePhabricator

[Regression] Random MediaWiki QUnit failure in CI: Language "zh" failed to load
Closed, ResolvedPublic


Since a week or so (?) mediawiki-extensions-qunit is often failing with the following error:

Chromium 48.0.2564 (Ubuntu 0.0.0) mediawiki.jqueryMsg Match PHP parser FAILED
Test timed out

Language "zh" failed to load.
Expected: true
Actual: false

Right before it:

WARN: 'Pending AJAX request', Object{url: '/jenkins-mediawiki-extensions-qunit-28717/load.php?skin=fallback&lang=zh&debug=false&', type: 'GET', .. dataType: 'script', .. }

I suspect that either MediaWiki has gotten worse with regards to loading language classes, or CI has regressed in having sufficient resources.

Current timeout for individual QUnit tests is 30 seconds. The first thing the test does is make the above request, so that entire request is not responding within 30 seconds. That seems like a problem.
.. and others.

Event Timeline

Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle added a subscriber: Krinkle.

How do you know it makes the lang=zh request first? In the log for #28714 I see:

  • 02:34:26: karma task starts
  • 02:35:13: Executed 0 of 1593
  • 02:35:19: Requesting ...lang=fr...
  • 02:35:22: proxying request ...lang=zh...
  • 02:35:24: Test timed out
  • 02:35:24: Language "zh" failed to load

This would be easier to debug if the apache access logs were in a less stupid format.

It's possible to get a rough idea of HTTP request execution time because the timestamp in the log line is the time at which the request started, but the log line is actually written when the request finishes.

For #28714, here is the lang=zh request:

localhost:80 - - [02/Feb/2016:02:35:22 +0000] "GET /jenkins-mediawiki-extensions-qunit-28714/load.php?skin=fallback&lang=zh&debug=false& HTTP/1.1" 200 6137 "http://localhost:9876/context.html" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/48.0.2564.82 Chrome/48.0.2564.82 Safari/537.36"

You see that it starts at 02:35:22 as reported by the karma log. The end time was certainly before 02:35:28 because a static file was requested at that time, and we can assume that it was served pretty quickly. It was probably before 02:35:24 which is the adjacent log entry, for lang=nl.

But there are slow things here. See the full log for 02:34-02:35 at P2555. A whole lot of slow requests were queued up at 02:34:37, and the last of them finished some time between 02:34:57 and 02:35:08, i.e. 20 or 30 seconds later (you see why I hate this log format). The slowest was:

localhost:80 - - [02/Feb/2016:02:34:37 +0000] "GET /jenkins-mediawiki-extensions-qunit-28714/load.php?debug=false&lang=en& HTTP/1.1" 200 2653278 "http://localhost:9876/context.html" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/48.0.2564.82 Chrome/48.0.2564.82 Safari/537.36"

Where does the message "Language "zh" failed to load." come from?

Where does the message "Language "zh" failed to load." come from?

ack 'Language .*? failed to load' --php --js
380:                                            assert.ok( false, 'Language "' + test.lang + '" failed to load.' );
902:                                            assert.ok( false, 'Language "' + test.lang + '" failed to load' );

The QUnit test is randomly flapping on 1.27.0-wmf.12 (some finding at T125655).

Seems to me the language caching takes too long and the test suite AJAX query ends up with a timeout.

Maybe some languages cache should be populated before the run, or the timeout raised.

For wmf.12 we can bump:

$ grep testTimeout tests/qunit/data/testrunner.js 
	QUnit.config.testTimeout = 30 * 1000;

Change 268365 had a related patch set uploaded (by JanZerebecki):
increase maximum timeout for full qunit test run

Let's merge that ^ timeout increase and see if that helps, sooner than later :)

My patch is a hack that may lead to worse code. The correct thing to do would be to refactor the test "Match PHP parser" in tests/qunit/suites/resources/mediawiki/mediawiki.jqueryMsg.test.js so that it does call stop before every item of the test data instead of once for all of them together. I.e. refactor it to break up the test into smaller parts.

Change 268427 had a related patch set uploaded (by Thiemo Mättig (WMDE)):
Call QUnit.stop() before each asynchronous task

Change 268365 merged by jenkins-bot:
qunit: Increase individual test timeout to 60 seconds

Change 268427 merged by jenkins-bot:
mediawiki.jqueryMsg.test: Call async() before each async test step

JanZerebecki claimed this task.