Page MenuHomePhabricator

Browser disconnects when running QUnit tests with local browsers
Closed, ResolvedPublic40 Estimated Story Points

Description

Our migration from grunt-contrib-qunit (using PhantomJS, hardcoded task in Jenkins job) to Karma (using Chrome/Firefox and the local npm-test entry point) is going well for standalone JavaScript projects (OOjs, VisualEditor).

These have since been promoted to be part of the main test pipeline and replaced their PhantomJS counterparts as this was blocking feature development and tests using newer browser features (and it significantly sped up the test due to Chromium 38+ being much faster than PhantomJS 1.9.x)

However for MediaWiki core and extensions, the Karma job (currently non-voting, work in progress) is failing most of the time due to the Chrome instance disconnecting half-way through the running of the qunit tests.

So far it seems to be consistently timing out at test 142/248. This is the mediawiki.jqueryMsg.test that makes various HTTP requests to load.php. The logic behind it seems solid (works locally for developers running tests in Chrome/Firefox, and is still working fine in Jenkins using PhantomJS today).

I initially suspected the way Apache is serving files through MediaWiki PHP (as opposed to the Node.js static file server that the PhantomJS test used) may be the culprit, but then the PhantomJS server also defers to Apache and MediaWiki for most files already.

I suspect it's caused by the Karma job running on slaves in labs (which are inherently slower than the production slaves PhantomJS runs on). This slowdown may be causing it to hit a timeout somewhere. Likely related to MediaWiki using I/O-bound cache files and the SQLite database.

Event Timeline

Krinkle claimed this task.
Krinkle raised the priority of this task from to High.
Krinkle updated the task description. (Show Details)
Krinkle added subscribers: greg, Jdforrester-WMF, Krinkle.

Comparing the log artefacts of a successful and failed build, I believe this is caused by T89180.

While pretty much all builds have db write failure errors in them. They usually hit one of the many l10n_cache queries, which fallback gracefully. Whereas writes for the language data module (which ResourceLoader indirectly uses, as called by that unit test), aren't recovered from very well.

Krinkle renamed this task from Investigate Chrome disconnect failures when running MediaWiki tests on labs slaves to Investigate browser disconnect failures when running MediaWiki tests on labs slaves.Feb 25 2015, 2:10 AM
Krinkle renamed this task from Investigate browser disconnect failures when running MediaWiki tests on labs slaves to Browser disconnects when running QUnit tests with local browsers (tracking).Feb 26 2015, 6:51 PM
Krinkle added a project: Tracking-Neverending.
Krinkle renamed this task from Browser disconnects when running QUnit tests with local browsers (tracking) to Browser disconnects when running QUnit tests with local browsers.Mar 20 2015, 8:23 AM
Krinkle removed a project: Tracking-Neverending.

The mediawiki-core-qunit-karma has been passing (except sometimes when run concurrently, T90673). And previously due to sqlite database locks (T89180; switched to MySQL).

The mwext-VisualEditor-qunit-karma job, however, is still consistently passing. This may have a different cause we haven't discovered yet.

Change 198199 had a related patch set uploaded (by Krinkle):
build: Increase qunit browserNoActivityTimeout from 10s to 60s

https://gerrit.wikimedia.org/r/198199

mwext-VisualEditor-karma-qunit is now passing.

The last remaining problem was that VE's test suite does not yield. While regular usage of VE does involve yielding (and definitely does not synchronously occupy the main thread for 10 seconds), in the tests suite smaller components of the software are run in quick succession of one another with many different scenarios and those don't yield much.

As a result, after 10 seconds, Karma assumes the socket to be dead as QUnit hasn't sent any events for a while. Increasing the noActivity timeout fixed this.

Change 198199 merged by jenkins-bot:
build: Increase qunit browserNoActivityTimeout from 10s to 60s

https://gerrit.wikimedia.org/r/198199

Change 198259 had a related patch set uploaded (by Krinkle):
build: Increase qunit browserNoActivityTimeout from 10s to 60s

https://gerrit.wikimedia.org/r/198259

Change 198260 had a related patch set uploaded (by Krinkle):
build: Increase qunit browserNoActivityTimeout from 10s to 60s

https://gerrit.wikimedia.org/r/198260

Change 198259 merged by jenkins-bot:
build: Increase qunit browserNoActivityTimeout from 10s to 60s

https://gerrit.wikimedia.org/r/198259

Change 198260 merged by jenkins-bot:
build: Increase qunit browserNoActivityTimeout from 10s to 60s

https://gerrit.wikimedia.org/r/198260