Page MenuHomePhabricator

Flaky test "MWTransclusionNode getWikitext FAILED: Pending AJAX request"
Closed, ResolvedPublic8 Estimated Story Points

Description

This test fails every once in a while, and has been for several months.

Please fix it or disable the test until it can be fixed.

Example:
https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit/17083/console

01:38:12 Chromium 45.0.2454 (Ubuntu 0.0.0) ve.dm.MWTransclusionNode getWikitext FAILED
01:38:12 	afterEach failed on getWikitext: Pending AJAX requests: 0 (active: 1)
01:38:12 	Error: Pending AJAX requests: 0 (active: 1)
01:38:12 	    at Object.teardown ..load.php
01:38:12 	    at Object.orgModule.teardown ..load.php
01:38:12 	    at Object.orgModule.teardown ..load.php
01:38:12 	    at Object.orgModule.teardown ..load.php
01:38:12 	    at runHook (node_modules/qunitjs/qunit/qunit.js:926:20)
01:38:12 	    at process (node_modules/qunitjs/qunit/qunit.js:583:24)
01:38:12 	    at begin (node_modules/qunitjs/qunit/qunit.js:628:2)
01:38:12 	    at node_modules/qunitjs/qunit/qunit.js:644:4
01:38:12

Event Timeline

Krinkle raised the priority of this task from to High.
Krinkle updated the task description. (Show Details)
Krinkle subscribed.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Looked at this. It's rather confusing - there's no async behaviour in this functionality or its testing. Preceding tests are also synchronous.

Looked at this. It's rather confusing - there's no async behaviour in this functionality or its testing. Preceding tests are also synchronous.

Unfortunately the error doesn't help narrow down what the request is because the tracking for ajax requests in MediaWiki core's testrunner.js (using jQuery.ajaxSetup hook) encountered 0 requests during this test. The count 1 comes from jQuery.active. So that suggests a request bypassed detection somehow.

Running Special:JavaScriptTest/qunit/plain?filter=ve.dm (which is all ve.dm.* tests, not just this one) shows no requests in the Network panel at run-time (other than the request for the JavaScript code itself, and two 404 Not Found for /Foo triggered by parsing image html).

Most likely the request is triggered from a different VE test entirely (one that triggers async behaviour, and only waits for some but not all of its requests). And sometimes leaks into this one?

It leaking into this one is rather deterministic though. In that it doesn't always leak, but when it does, it leaks into this one.

Still happening once or twice every day when merging commits in MediaWiki core. Again just now at https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit/23350/console for https://gerrit.wikimedia.org/r/258311.

If this can't be narrowed down to disable or otherwise resolved, I have no choice but to remove the VisualEditor extension from the communal mediawiki-extensions-qunit Jenkins job. We've done this in the past for other extensions as well and is only fair.

Change 259959 had a related patch set uploaded (by Legoktm):
Remove VisualEditor from extension-gate

https://gerrit.wikimedia.org/r/259959

There are other extensions inside the gate that depend upon VisualEditor, so removing it isn't going to be that straightforward. Is there a way to disable the individual test?

Change 260387 had a related patch set uploaded (by Bartosz Dziewoński):
ve.dm.MWTransclusionNode: Remove flaky test

https://gerrit.wikimedia.org/r/260387

We can not remove VE from the Qunit job since bunch of extensions rely on its presence. Can we narrow down which VE test is falling? If the QUnit test is flappy, the easiest action would be to remove it from VE and have a task to restore / fix it.

Also afterEach failed on getWikitext: Pending AJAX requests: 0 (active: 1) seems to indicate we track active requests going on. Would it be possible to dump it to ease diagnostic?

"MWTransclusionNode getWikitext" is not the flaky test. It is the victim of a yet-to-be-identified test that runs asynchronous methods but is not tracking them as such. The code ends up running quite deterministically as part of the "MWTransclusionNode getWikitext" test window. Presumably the test in question is before it and relatively close.

Change 260387 abandoned by Bartosz Dziewoński:
ve.dm.MWTransclusionNode: Disable flaky test

https://gerrit.wikimedia.org/r/260387

On the Jenkins master (gallium) I did a grep of the console logs for all qunit jobs looking for afterEach failed on getWikitext.

The only matches are in the shared job mediawiki-extensions-qunit which has the following extensions cloned in:

AbuseFilter
Babel
Cards
CheckUser
CirrusSearch
Cite
cldr
ConfirmEdit
Echo
Elastica
EventLogging
Flow
Gather
GlobalCssJs
GuidedTour
JsonConfig
MobileApp
MobileFrontend
MwEmbedSupport
ParserFunctions
PdfHandler
SandboxLink
SpamBlacklist
Thanks
TimedMediaHandler
UniversalLanguageSelector
VisualEditor
ZeroBanner
ZeroPortal

The following ones have a ResourceLoaderTestModules hook:

Cards
Echo
EventLogging
Flow
Gather
GuidedTour
MobileFrontend
Thanks
UniversalLanguageSelector
VisualEditor

So the culprit would be in one of them. If we had a way to have more verbose logs from the karma run (such as showing each tests and dumping the active request going on), that would surely help.

The Pending AJAX requests: 0 (active: 1) seems there is a race condition with jQuery.active reporting 1 but by the time we look at requests none are found and the assertion fail.

+ MobileFrontend , seems the pending request might be MobileFrontend loadWatchist() ..

Some of the few last requests for a failing build of mediawiki-extensions-qunit :

Start request GET /jenkins-mediawiki-extensions-qunit-23350 ...
/load.php?debug=false&lang=en&modules=ext.flow.editors.visualeditor&skin=fallback&version=912fbd0bc813
/load.php?skin=fallback&lang=en&debug=false&modules=mediawiki.language.data%7Cmediawiki.language&only=scripts
/load.php?skin=fallback&lang=fr&debug=false&modules=mediawiki.language.data%7Cmediawiki.language&only=scripts
/load.php?skin=fallback&lang=ar&debug=false&modules=mediawiki.language.data%7Cmediawiki.language&only=scripts
/load.php?skin=fallback&lang=jp&debug=false&modules=mediawiki.language.data%7Cmediawiki.language&only=scripts
/load.php?skin=fallback&lang=zh&debug=false&modules=mediawiki.language.data%7Cmediawiki.language&only=scripts
/load.php?skin=fallback&lang=nl&debug=false&modules=mediawiki.language.data%7Cmediawiki.language&only=scripts
/load.php?skin=fallback&lang=ml&debug=false&modules=mediawiki.language.data%7Cmediawiki.language&only=scripts
/load.php?skin=fallback&lang=hi&debug=false&modules=mediawiki.language.data%7Cmediawiki.language&only=scripts
/api.php?action=query&format=json&prop=info%7Crevisions%7Cpageprops&formatversion=2&rvprop=timestamp%7Cuser&generator=watchlistraw&gwrnamespace=0&gwrlimit=50&ppprop=displaytitle&continue=

That last one being:

ParameterValue
actionquery
formatjson
propinfo%7Crevisions%7Cpageprops
formatversion2
rvproptimestamp%7Cuser
generatorwatchlistraw
gwrnamespace0
gwrlimit50
pppropdisplaytitle
continue 

I crafted a patch that enables debug karma debug log https://gerrit.wikimedia.org/r/#/c/260388/2/Gruntfile.js,cm the consoleCapture does not seem to capture anything though.

An example is https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit/24266/consoleFull , looking for the above request:

Chromium 47.0.2526 (Ubuntu 0.0.0): Executed 366 of 753 SUCCESS (0 secs / 8.36 secs) 
Chromium 47.0.2526 (Ubuntu 0.0.0): Executed 367 of 753 SUCCESS (0 secs / 8.362 secs)
Chromium 47.0.2526 (Ubuntu 0.0.0): Executed 368 of 753 SUCCESS (0 secs / 8.364 secs)
22 12 2015 11:37:40.601:DEBUG [middlware:source-files]: Requesting /jenkins-mediawiki-extensions-qunit-24266/api.php?action=query&format=json&prop=info%7Crevisions%7Cpageprops&formatversion=2&rvprop=timestamp%7Cuser&generator=watchlistraw&gwrnamespace=0&gwrlimit=50&ppprop=displaytitle&continue= /
22 12 2015 11:37:40.601:DEBUG [middlware:source-files]: Fetching /jenkins-mediawiki-extensions-qunit-24266/api.php
22 12 2015 11:37:40.601:DEBUG [proxy]: proxying request - /jenkins-mediawiki-extensions-qunit-24266/api.php?action=query&format=json&prop=info%7Crevisions%7Cpageprops&formatversion=2&rvprop=timestamp%7Cuser&generator=watchlistraw&gwrnamespace=0&gwrlimit=50&ppprop=displaytitle&continue= to localhost:9412
Chromium 47.0.2526 (Ubuntu 0.0.0): Executed 369 of 753 SUCCESS (0 secs / 8.395 secs)
Chromium 47.0.2526 (Ubuntu 0.0.0): Executed 370 of 753 SUCCESS (0 secs / 8.4 secs)
Chromium 47.0.2526 (Ubuntu 0.0.0): Executed 371 of 753 SUCCESS (0 secs / 8.409 secs)

Could it be that query which does not terminate properly?

It would come from MobileFrontend resources/mobile.watchlist/WatchListGateway.js which has:

    WatchListGateway.prototype = {
        /**
         * Load the list of items on the watchlist
         * @returns {jQuery.Deferred}
         */
        loadWatchlist: function () {
            var self = this,
                params = $.extend( {
                    prop: [ 'info', 'revisions' ].concat( mw.config.get( 'wgMFQueryPropModules' ) ),
                    format: 'json', 
                    formatversion: 2,
                    rvprop: 'timestamp|user',
                    generator: 'watchlistraw',
                    gwrnamespace: '0',
                    gwrlimit: this.limit
                }, mw.config.get( 'wgMFSearchAPIParams' ), this.continueParams );
                
...

The loadWatchlist() is invoked in several tests from tests/qunit/mobile.watchlist/test_WatchListGateway.js. No idea which one though.

Change 260620 had a related patch set uploaded (by Jdlrobson):
Avoid API requests on scroll events in watchlist test

https://gerrit.wikimedia.org/r/260620

Above patch fixes it @hashar thanks for the debugging this has been giving me headaches for months. Anyway we can improve error reporting for this sort of thing?

Would it make sense to make tests fail if they hit the api endpoint at all?

Change 260620 merged by jenkins-bot:
Avoid API requests on scroll events in watchlist test

https://gerrit.wikimedia.org/r/260620

Jdlrobson claimed this task.

Change 259959 abandoned by Hashar:
Remove VisualEditor from extension-gate

Reason:
T116258 has been fixed. It was an issue in MobileFrontend triggering Ajax requests on scrolling and got fixed by:

Avoid API requests on scroll events in watchlist test

https://gerrit.wikimedia.org/r/260620

https://gerrit.wikimedia.org/r/259959

Change 295742 had a related patch set uploaded (by Mattflaschen):
Avoid API requests on scroll events in watchlist test

https://gerrit.wikimedia.org/r/295742

Change 295742 merged by jenkins-bot:
Avoid API requests on scroll events in watchlist test

https://gerrit.wikimedia.org/r/295742

That last patch was for MobileFrontend REL1_26.