Page MenuHomePhabricator

Flaky selenium test "Failed to wait for mediawiki.base to be ready" (tests/selenium/specs/page.js)
Open, Needs TriagePublic

Description

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Math/+/1091730 on REL1_43 commits.

https://integration.wikimedia.org/ci/job/quibble-composer-mysql-php81-selenium/3697/console

» /tests/selenium/specs/page.js
[Chrome 120.0.0.0 linux #0-6] Page
[Chrome 120.0.0.0 linux #0-6]    ✓ should be previewable @daily
[Chrome 120.0.0.0 linux #0-6]    ✓ should be creatable
[Chrome 120.0.0.0 linux #0-6]    ✓ should be re-creatable
[Chrome 120.0.0.0 linux #0-6]    ✓ should be editable @daily
[Chrome 120.0.0.0 linux #0-6]    ✓ should have history @daily
[Chrome 120.0.0.0 linux #0-6]    ? should be deletable
[Chrome 120.0.0.0 linux #0-6]    ✖ "before each" hook for Page
[Chrome 120.0.0.0 linux #0-6]
[Chrome 120.0.0.0 linux #0-6] 5 passing (54.9s)
[Chrome 120.0.0.0 linux #0-6] 1 failing
[Chrome 120.0.0.0 linux #0-6]
[Chrome 120.0.0.0 linux #0-6] 1) Page "before each" hook for Page
[Chrome 120.0.0.0 linux #0-6] Failed to wait for mediawiki.base to be ready after 5000 ms.
[Chrome 120.0.0.0 linux #0-6] Error: Failed to wait for mediawiki.base to be ready after 5000 ms.
[Chrome 120.0.0.0 linux #0-6]     at async Object.waitForModuleState (/workspace/src/tests/selenium/wdio-mediawiki/Util.js:50:3)
[Chrome 120.0.0.0 linux #0-6]     at async Object.isTargetNotWikitext (/workspace/src/tests/selenium/wdio-mediawiki/Util.js:25:3)
[Chrome 120.0.0.0 linux #0-6]     at async Context.<anonymous> (/workspace/src/tests/selenium/specs/page.js:29:8)

Previously:
T366791: CollaborationKit CI is broken

Details

Related Changes in Gerrit:

Event Timeline

Some more examples in T388416#10648479 and that task's description. This seems to be happening a lot, and always for this page.js spec.

In the WMDE-TechWish team (currently working on Cite) we also run into this problem surprisingly often, something like one or two times per day. A recheck sometimes helps, sometimes the same failure happens again. Just to let you know. Thanks a lot for looking into this!

Noting what (I think) actually happening to save folks some digging:

  • Page.js is opening Special:BlankPage
  • Then it checks if the page BeforeEach-name-<random number>--Iñtërnâtiônàlizætiøn is Wikitext with the function IsTargetNotWikitext
  • It's timing out waiting for the browser to execute typeof mw !== 'undefined' && mw.loader.getState('mediawiki.base') === 'ready' (see waitForModuleState)

Why is it timing out? Would need some selenium expertise to say.

Why is it timing out? Would need some selenium expertise to say.

I suspect that it might not be selenium-specific, this time. Judging from the other failures in T388416, it seems that CI is just slow somehow. Maybe, waiting for the module already takes a bit less than 5 seconds under normal conditions, and if the call happens at the wrong time it might fail by the 5s timeout.

One thing I would find useful is to see how long it takes under normal circumstances. And then maybe get a profile for the selenium run using excimer, like I did in T381895.

Change #1129323 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/core@master] [DNM] Measure selenium performance

https://gerrit.wikimedia.org/r/1129323

Maybe, waiting for the module already takes a bit less than 5 seconds under normal conditions, and if the call happens at the wrong time it might fail by the 5s timeout.

This theory was quickly disproved by the patch above. It failed on the 4th selenium run, with the waitForModuleState call happening 10 times per run. Most of the times it only waited less than 50 milliseconds, then we have one at 106ms and one (towards the beginning) at 158ms. And then the failing one which necessarily takes 5 seconds. So, it's not general slowness.

Using r1129873, I managed to get a failed run with the selenium logging level set to "trace", and with timestamped MW logs: https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php81/4595/console. I copied the relevant selenium logs to P74284. The MW logs are overwhelming. There's no point reading anything from that file without knowing what to look for.

In the selenium logs, we have this bit:

20:02:54 [0-6] 2025-03-20T20:02:54.567Z DEBUG @wdio/utils:shim: Finished to run "afterTest" hook in 348ms
20:02:54 [0-6] 2025-03-20T20:02:54.585Z INFO devtools: COMMAND deleteAllCookies()

Unfortunately, selenium does not log when it starts running a test hook, but from the above, knowing that deleteAllCookies is the first call in the beforeEach hook, we can assume that it's when it started running. So we have:

20:02:54 [0-6] 2025-03-20T20:02:54.585Z INFO devtools: COMMAND deleteAllCookies()
20:02:54 [0-6] 2025-03-20T20:02:54.760Z INFO devtools: RESULT null
20:02:54 [0-6] 2025-03-20T20:02:54.768Z INFO devtools: COMMAND navigateTo("http://127.0.0.1:9413//index.php?uselang=en&title=Special%3ABlankPage")
20:02:55 [0-6] 2025-03-20T20:02:55.440Z INFO devtools: RESULT null
20:02:55 [0-6] 2025-03-20T20:02:55.496Z INFO devtools: COMMAND executeScript("return (() => document.readyState === 'complete').apply(null, arguments)", <object>)
20:02:55 [0-6] 2025-03-20T20:02:55.517Z INFO devtools: RESULT true
20:02:55 [0-6] 2025-03-20T20:02:55.523Z INFO devtools: COMMAND executeScript("return (( arg ) => typeof mw !== 'undefined' && mw.loader.getState( arg.name ) === arg.status).apply(null, arguments)", <object>)
20:02:55 [0-6] 2025-03-20T20:02:55.537Z INFO devtools: RESULT false

and then more failed attempts to see if mediawiki.base is available, until it timed out at 20:03:00. This looks normal: it really is trying to see if mediawiki.base is available, which it isn't. So, I restricted MW logs to everything between 20:02:54 and 20:03:00. The request to Special:BlankPage is easy to find, and there's nothing else running concurrently at this point. I pasted the relevant log lines (unfiltered) in P74285. I also pasted the same request from another test run (this time successful) in P74286. Comparing the two, they are identical, except for unique request-bound stuff like session IDs and profiling numbers. The only non-trivial differences are in the jobs that get processed after the response; but precisely because this happens after the response, it can't lead to differences in the initial page load. In particular, the PHP notice seems harmless, but I filed T389556 nonetheless.

So, by now we know that there are likely no issue with selenium itself and the initial server response. The next question would be what is happening on the client side, and particularly the load.php requests. But it looks like we aren't logging anything from them?!

Change #1129323 abandoned by Daimona Eaytoy:

[mediawiki/core@master] [DNM] Track down recurrent selenium timeouts

Reason:

Using I73c306846ba15ba4de6500f078e346f87e07ea33

https://gerrit.wikimedia.org/r/1129323

I'm not sure why there are no logs for load.php in CI. I'm guessing there's a setting somewhere that disables them, but I'm not finding it (wgDebugLogFile is set). However, I checked out the Apache access log, and these are the only lines starting from 20:02:54:

2025-03-20T20:02:54	326811	6192	GET	http://127.0.0.1:9413/index.php?uselang=en&title=Special%3ABlankPage	text/html	-	Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36	en-US,en;q=0.9
2025-03-20T20:02:55	136287	88309	GET	http://127.0.0.1:9413/load.php?lang=en&modules=jquery%2Coojs-ui-core%7Coojs-ui-core.icons%2Cstyles%7Coojs-ui.styles.indicators&skin=vector-2022&version=wvr8a	text/javascript	http://127.0.0.1:9413//index.php?uselang=en&title=Special%3ABlankPage	Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36	en-US,en;q=0.9

In particular, there is no request for the startup module. I'm not yet sure what to make of it.

I'm not sure why there are no logs for load.php in CI. I'm guessing there's a setting somewhere that disables them, but I'm not finding it (wgDebugLogFile is set).

Needs $wgDebugRawPage = true; to see load.php in the server logs. For browser cache hit there is nothing in the logs (not sure how selenium handles that)

I'm not sure why there are no logs for load.php in CI. I'm guessing there's a setting somewhere that disables them, but I'm not finding it (wgDebugLogFile is set).

Needs $wgDebugRawPage = true; to see load.php in the server logs.

Oh I see, thank you! I guess I was confused because I could see logs for load.php locally. I do not haveDebugRawPage enabled, but maybe I had commented out the minimumLevel check when I was doing the experiment. At any rate, I removed the minimumLevel check in my test patch, let's see!

For browser cache hit there is nothing in the logs (not sure how selenium handles that)

That was my first thought as well, and I also don't know what is or isn't cached in selenium. But we won't need to look into that, if we're able to get MW logs for the load entry point.

I could finally reproduce this in https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php81/5416/console with full debug logging, load.php included. Build kept forever, I still need to look into it though.

Also reproduced in https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php81/10550/console (kept forever). I took a brief look but can't quite tell what's wrong. One thing in common in both this and the previous run is that, right around the timeout, we have a load.php request that prints Request ended normally (from MediaWikiEntryPoint::restInPeace); but then, 5 seconds later (when the timeout occurs), it prints some Wikibase debug information (nothing special, this seems to be printed at service initialization / extension registration time for all requests). For example, full request logs from the previous test:

2025-03-24 18:56:49 b5f341aaba5e wikidb: 

Start request GET /load.php?lang=en&modules=jquery%2Coojs-ui-core%7Cjquery.makeCollapsible%7Cjquery.makeCollapsible.styles%7Cmediawiki.action.edit.collapsibleFooter&skin=vector-2022&version=3niel
IP: 127.0.0.1
HTTP HEADERS:
COOKIE: wikidb_session=jtmclm84fd7iu2qj9jhu1osaacqfre2m
ACCEPT-LANGUAGE: en-US,en;q=0.9
ACCEPT-ENCODING: gzip, deflate, br
REFERER: http://127.0.0.1:9413//index.php?action=submit&vehidebetadialog=1&hidewelcomedialog=1&title=BeforeEach-name-0.3392057346002999-I%C3%B1t%C3%ABrn%C3%A2ti%C3%B4n%C3%A0liz%C3%A6ti%C3%B8n
SEC-FETCH-DEST: script
SEC-FETCH-MODE: no-cors
SEC-FETCH-SITE: same-origin
ACCEPT: */*
SEC-CH-UA-PLATFORM: "Linux"
USER-AGENT: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
SEC-CH-UA-MOBILE: ?0
SEC-CH-UA: "Not_A Brand";v="8", "Chromium";v="120"
CONNECTION: keep-alive
HOST: 127.0.0.1:9413
(end headers)
2025-03-24 18:56:49 b5f341aaba5e wikidb: SessionManager using store Wikimedia\ObjectCache\MemcachedPhpBagOStuff
2025-03-24 18:56:49 b5f341aaba5e wikidb: LocalisationCache using store LCStoreStaticArray
2025-03-24 18:56:49 b5f341aaba5e wikidb: MainWANObjectCache using store Wikimedia\ObjectCache\MemcachedPhpBagOStuff
2025-03-24 18:56:49 b5f341aaba5e wikidb: MicroStash using store Wikimedia\ObjectCache\MemcachedPhpBagOStuff
2025-03-24 18:56:49 b5f341aaba5e wikidb: wgEventLoggingBaseUri has not been configured.
2025-03-24 18:56:49 b5f341aaba5e wikidb: MainObjectStash using store SqlBagOStuff
2025-03-24 18:56:49 b5f341aaba5e wikidb: MediaWiki\Parser\ParserFactory: using default preprocessor
2025-03-24 18:56:49 b5f341aaba5e wikidb: Wikimedia\Rdbms\DatabaseMySQL::open [0.106ms] localhost:/workspace/db/quibble-mysql-46d7q7l4/socket: SET group_concat_max_len = 262144, `sql_mode` = 'STRICT_ALL_TABLES,ONLY_FULL_GROUP_BY'
2025-03-24 18:56:49 b5f341aaba5e wikidb: Wikimedia\Rdbms\LoadBalancer::reallyOpenConnection: opened new connection for 0/wikidb
2025-03-24 18:56:49 b5f341aaba5e wikidb: MemCache: got WANCache:global:rdbms-server-readonly:localhost%3A/workspace/db/quibble-mysql-46d7q7l4/socket|#|v
2025-03-24 18:56:49 b5f341aaba5e wikidb: SqlBagOStuff::fetchBlobs [0.376ms] localhost:/workspace/db/quibble-mysql-46d7q7l4/socket: SELECT  keyname,value,exptime  FROM `objectcache`    WHERE keyname IN ('wikidb:ResourceLoaderModule-dependencies:jquery|vector-2022|en','wikidb:ResourceLoaderModule-dependencies:oojs-ui-core|vector-2022|en','wikidb:ResourceLoaderModule-dependencies:jquery.makeCollapsible|vector-2022|en','wikidb:ResourceLoaderModule-dependencies:jquery.makeCollapsible.styles|vector-2022|en','wikidb:ResourceLoaderModule-dependencies:mediawiki.action.edit.collapsibleFooter|vector-2022|en')  AND (exptime >= '20250324185649')  
2025-03-24 18:56:49 b5f341aaba5e wikidb: SqlBagOStuff debug: SqlBagOStuff::fetchBlobs: retrieved wikidb:ResourceLoaderModule-dependencies:jquery.makeCollapsible|vector-2022|en; expiry time is 20260324184525
2025-03-24 18:56:49 b5f341aaba5e wikidb: Wikimedia\Rdbms\LoadBalancer::reuseOrOpenConnectionForNewRef: reusing connection for 0/wikidb
2025-03-24 18:56:49 b5f341aaba5e wikidb: SqlBagOStuff debug: SqlBagOStuff::fetchBlobs: retrieved wikidb:ResourceLoaderModule-dependencies:mediawiki.action.edit.collapsibleFooter|vector-2022|en; expiry time is 20260324184525
2025-03-24 18:56:49 b5f341aaba5e wikidb: Wikimedia\Rdbms\LoadBalancer::reuseOrOpenConnectionForNewRef: reusing connection for 0/wikidb
2025-03-24 18:56:49 b5f341aaba5e wikidb: MemCache: got WANCache:wikidb:resourceloader-messageblob:oojs-ui-core:en:9c2b2d88309a7861e251946fe4875b1e|#|v
2025-03-24 18:56:49 b5f341aaba5e wikidb: MemCache: got WANCache:wikidb:resourceloader-messageblob:jquery.makeCollapsible:en:2e1aec29500269620cc8fe965cd4f966|#|v
2025-03-24 18:56:49 b5f341aaba5e wikidb: MemCache: got WANCache:wikidb:resourceloader-messageblob:jquery.makeCollapsible.styles:en:2e1aec29500269620cc8fe965cd4f966|#|v
2025-03-24 18:56:49 b5f341aaba5e wikidb: MemCache: got WANCache:global:resourceloader-messageblob|#|t
2025-03-24 18:56:49 b5f341aaba5e wikidb: MemCache: got WANCache:wikidb:resourceloader-messageblob:oojs-ui-core|#|t
2025-03-24 18:56:49 b5f341aaba5e wikidb: MemCache: got WANCache:wikidb:resourceloader-messageblob:jquery.makeCollapsible|#|t
2025-03-24 18:56:49 b5f341aaba5e wikidb: MemCache: got WANCache:wikidb:resourceloader-messageblob:jquery.makeCollapsible.styles|#|t
2025-03-24 18:56:49 b5f341aaba5e wikidb: LocalisationCache::loadCoreData: got localisation for en from source
2025-03-24 18:56:49 b5f341aaba5e wikidb: [81949e5f1cdd585b35e0d309] /load.php?lang=en&modules=jquery%2Coojs-ui-core%7Cjquery.makeCollapsible%7Cjquery.makeCollapsible.styles%7Cmediawiki.action.edit.collapsibleFooter&skin=vector-2022&version=3niel   PHP Notice: ob_end_flush(): Failed to send buffer of zlib output compression (0)
#0 [internal function]: MWExceptionHandler::handleError(int, string, string, int)
#1 /workspace/src/includes/MediaWikiEntryPoint.php(1087): ob_end_flush()
#2 /workspace/src/includes/MediaWikiEntryPoint.php(1231): MediaWiki\MediaWikiEntryPoint->flushOutputBuffer()
#3 /workspace/src/includes/MediaWikiEntryPoint.php(481): MediaWiki\MediaWikiEntryPoint->fastCgiFinishRequest()
#4 /workspace/src/includes/MediaWikiEntryPoint.php(454): MediaWiki\MediaWikiEntryPoint->doPostOutputShutdown()
#5 /workspace/src/includes/MediaWikiEntryPoint.php(211): MediaWiki\MediaWikiEntryPoint->postOutputShutdown()
#6 /workspace/src/load.php(42): MediaWiki\MediaWikiEntryPoint->run()
#7 {main}
2025-03-24 18:56:49 b5f341aaba5e wikidb: Output buffer flushed
2025-03-24 18:56:49 b5f341aaba5e wikidb: LBFactory shutdown completed
2025-03-24 18:56:49 b5f341aaba5e wikidb: Request ended normally
2025-03-24 18:56:54 b5f341aaba5e wikidb: Wikibase\Lib\SettingsArray::getSetting: setting entitySources was given as a closure, resolve it to array (
  'local' => 
  array (
    'entityNamespaces' => 
    array (
      'item' => 120,
      'property' => 122,
      'mediainfo' => '6/mediainfo',
    ),
    'repoDatabase' => false,
    'baseUri' => 'http://127.0.0.1:9413/entity/',
    'rdfNodeNamespacePrefix' => 'wd',
    'rdfPredicateNamespacePrefix' => '',
    'interwikiPrefix' => '',
  ),
)
2025-03-24 18:56:54 b5f341aaba5e wikidb: 

Start request GET /load.php?lang=en&modules=oojs-ui%2Coojs-ui-toolbars%2Coojs-ui-widgets%7Coojs-ui-toolbars.icons%7Coojs-ui-widgets.icons%7Coojs-ui.styles.icons-accessibility%2Cicons-editing-advanced%2Cicons-editing-core&skin=vector-2022&version=bok6x

I'm still not sure what to make of this though.