Page MenuHomePhabricator

ERROR @wdio/sync: Error: socket hang up
Closed, ResolvedPublic

Description

It looks like all of our selenium jobs for GrowthExperiments started failing yesterday with:

08:06:31 [0-0] 2021-11-26T07:06:51.639Z ERROR @wdio/sync: Error: socket hang up
08:06:51     at connResetException (internal/errors.js:628:14)
08:06:51     at Socket.socketOnEnd (_http_client.js:499:23)
08:06:51     at endReadableNT (internal/streams/readable.js:1317:12)
08:06:51     at processTicksAndRejections (internal/process/task_queues.js:82:21)
08:06:51     at Object.beforeSuite (/workspace/src/extensions/GrowthExperiments/tests/selenium/wdio.conf.js:23:11)
08:06:51 [0-0] Error in "Homepage.is enabled for new user"
08:06:52 net::ERR_CONNECTION_REFUSED at http://127.0.0.1:9412//index.php?title=Special%3AHomepage
08:06:52 Error: net::ERR_CONNECTION_REFUSED at http://127.0.0.1:9412//index.php?title=Special%3AHomepage
08:06:52     at navigate (/workspace/src/extensions/GrowthExperiments/node_modules/@wdio/runner/node_modules/puppeteer-core/lib/cjs/puppeteer/common/FrameManager.js:115:23)
08:06:52     at processTicksAndRejections (internal/process/task_queues.js:95:5)
08:06:52     at async FrameManager.navigateFrame (/workspace/src/extensions/GrowthExperiments/node_modules/@wdio/runner/node_modules/puppeteer-core/lib/cjs/puppeteer/common/FrameManager.js:90:21)
08:06:52     at async Frame.goto (/workspace/src/extensions/GrowthExperiments/node_modules/@wdio/runner/node_modules/puppeteer-core/lib/cjs/puppeteer/common/FrameManager.js:416:16)
08:06:52     at async Page.goto (/workspace/src/extensions/GrowthExperiments/node_modules/@wdio/runner/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Page.js:789:16)
08:06:52     at async DevToolsDriver.navigateTo (/workspace/src/extensions/GrowthExperiments/node_modules/@wdio/runner/node_modules/devtools/build/commands/navigateTo.js:18:5)
08:06:52     at async Browser.wrappedCommand (/workspace/src/extensions/GrowthExperiments/node_modules/@wdio/runner/node_modules/devtools/build/devtoolsdriver.js:75:26)
08:06:52     at HomepagePage.openTitle (/workspace/src/extensions/GrowthExperiments/node_modules/wdio-mediawiki/Page.js:22:11)
08:06:52     at HomepagePage.open (/workspace/src/extensions/GrowthExperiments/tests/selenium/pageobjects/homepage.page.js:14:9)
08:06:52     at Context.<anonymous> (/workspace/src/extensions/GrowthExperiments/tests/selenium/specs/homepage.js:10:16)

For example https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/124732/console

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@zeljkofilipin any idea what might be going on here? I don't think we changed anything in our test setup recently.

kostajh triaged this task as High priority.Nov 26 2021, 8:19 AM
kostajh added a subscriber: hashar.

I have compared the last two builds of that job for the change https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/741866

Good: https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/124603/
Bad: https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/124732/

Doing a diff of the raw console text, I can not spot anything beside the resolved branch name of the lib/ve submodules in the VisualEditor extension (but they have the same commit):

good.txt: 5a97be3a02fcba0016c10c39e0174525d3c5c026 lib/ve (heads/master)
bad.txt: 5a97be3a02fcba0016c10c39e0174525d3c5c026 lib/ve (remotes/origin/wmf/1.38.0-wmf.10)

I don't see any recent made to the package-lock.json of mediawiki/core and GrowthExperiments.

The Docker images haven't been changed this week.

The underlying host running the Docker image had no package change beside wmf-certificates being upgraded but those are not exposed in the container or Docker.

So it is an entire mystery unfortunately :-\

One thing of interest is that the quibble-vendor-mysql-php72-selenium-docker job which runs GrowthExperiments tests passes, while wmf-quibble-selenium-php72-docker fails.

07:05:57 Execution of 2 spec files started at 2021-11-26T07:05:57.049Z
07:05:57 
07:05:57 [0-0] RUNNING in chrome - /tests/selenium/specs/homepage.js
07:06:31 [0-0] 2021-11-26T07:06:51.639Z ERROR @wdio/sync: Error: socket hang up

36 seconds delay, the php built-in server does not respond on the first try which really could be due to anything. Maybe the first request triggers some heavy action that takes longer than 30secs. If I got it right the Selenium test for GrowthExperiments triggers the very first request made to the PHP built-in server.

If something takes too long, we get debug log in a mw-debug-www.log file. We do not have any timing in mw-debug-www.log.gz unfortunately but we can still dig into it: https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/124732/artifact/log/mw-debug-www.log.gz

Start request POST /api.php?format=json
IP: 127.0.0.1
HTTP HEADERS:
USER-AGENT: mwbot/2.0.0
HOST: 127.0.0.1:9412
CONTENT-TYPE: application/x-www-form-urlencoded
ACCEPT: application/json
CONTENT-LENGTH: 60
CONNECTION: close
(end headers)

There is no other Start request lines in the debug file which has 121 428 lines. So I am assuming the entirety of the debug log is for that first initial request. Looking at the DBQuery log bucket:

grep DBQuery mw-debug-www.log|cut -d\ -f2|sort -n|uniq -c|sort -nr
12137 MessageCache::loadFromDB(en)-small
12137 MessageCache::loadFromDB(en)-big
12137 MediaWiki\Storage\SqlBlobStore::fetchBlobs
12137 MediaWiki\Revision\RevisionStore::getSlotRowsForBatch
12136 MediaWiki\User\ActorStore::getUserIdentityByName
    5 Wikimedia\Rdbms\DatabaseMysqlBase::serverIsReadOnly
    2 MediaWiki\Storage\NameTableStore::loadTable
    1 Wikimedia\Rdbms\DatabaseMysqlBase::open
    1 Wikimedia\Rdbms\Database::beginIfImplied

The job uses MySQL as a backend. Looks like the first request spend way too much time loading messages. Maybe due to MediaWiki\User\ActorStore::getUserIdentityByName?

Gotta compare with the debug log with the previous successful build. Then if it is a timeout, the issue might have been introduced earlier than the last successful build :-\

For the good build https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/124603/artifact/log/mw-debug-www.log.gz the first request entire debug log is:

Start request POST /api.php?format=json
IP: 127.0.0.1
HTTP HEADERS:
USER-AGENT: mwbot/2.0.0
HOST: 127.0.0.1:9412
CONTENT-TYPE: application/x-www-form-urlencoded
ACCEPT: application/json
CONTENT-LENGTH: 60
CONNECTION: close
(end headers)
[Wikibase] Wikibase\Lib\SettingsArray::getSetting: setting entitySources was given as a closure, resolve it to array (
  'local' => 
  array (
    'entityNamespaces' => 
    array (
      'item' => 120,
      'property' => 122,
      'mediainfo' => '6/mediainfo',
    ),
    'repoDatabase' => false,
    'baseUri' => 'http://127.0.0.1:9412/entity/',
    'rdfNodeNamespacePrefix' => 'wd',
    'rdfPredicateNamespacePrefix' => '',
    'interwikiPrefix' => '',
  ),
)
[session] SessionManager using store APCUBagOStuff
[localisation] LocalisationCache using store LCStoreStaticArray
[DBReplication] ChronologyProtector using store APCUBagOStuff
[session] SessionBackend "2ad45iaiec2q1v6430e3jlbarprvu1sp" is unsaved, marking dirty in constructor
[session] SessionBackend "2ad45iaiec2q1v6430e3jlbarprvu1sp" save: dataDirty=1 metaDirty=1 forcePersist=0
[cookie] already deleted setcookie: "wikidb_session", "", "1606303850", "/", "", "", "1", ""
[cookie] already deleted setcookie: "wikidbUserID", "", "1606303850", "/", "", "", "1", ""
[cookie] already deleted setcookie: "wikidbToken", "", "1606303850", "/", "", "", "1", ""
[cookie] already deleted setcookie: "forceHTTPS", "", "1606303850", "/", "", "", "1", ""
[EventLogging] wgEventLoggingBaseUri has not been configured.
[MessageCache] MessageCache using store APCUBagOStuff
[SQLBagOStuff] MainObjectStash using store ReplicatedBagOStuff
[DBQuery] Wikimedia\Rdbms\DatabaseMysqlBase::open [0s] localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket: SET group_concat_max_len = 262144, sql_mode = 'TRADITIONAL'
[DBReplication] Wikimedia\Rdbms\LBFactory::getChronologyProtector: request info {
    "IPAddress": "127.0.0.1",
    "UserAgent": "mwbot\/2.0.0",
    "ChronologyProtection": false,
    "ChronologyPositionIndex": 0,
    "ChronologyClientId": false
}
[DBReplication] Wikimedia\Rdbms\ChronologyProtector::lazyStartup: client ID is 987d54801edb71c7557b3d614389ff9d; key is global:Wikimedia\Rdbms\ChronologyProtector:987d54801edb71c7557b3d614389ff9d:v2
[DBReplication] Wikimedia\Rdbms\ChronologyProtector::applySessionReplicationPosition: DEFAULT (localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket) has no position
[DBConnection] Wikimedia\Rdbms\LoadBalancer::lazyLoadReplicationPositions: executed chronology callback.
[DBConnection] Wikimedia\Rdbms\LoadBalancer::getLocalConnection: opened new connection for local/0
[DBQuery] Wikimedia\Rdbms\DatabaseMysqlBase::serverIsReadOnly [0.001s] localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket: SELECT @@GLOBAL.read_only AS Value
[DBQuery] Wikimedia\Rdbms\Database::beginIfImplied (MessageCache::loadFromDB(en)-big) [0s] localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket: BEGIN
[DBQuery] MessageCache::loadFromDB(en)-big [0s] localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket: SELECT  page_title,page_latest  FROM `page`    WHERE page_is_redirect = 0 AND page_namespace = 8 AND (page_title NOT LIKE '%/%' ESCAPE '`' ) AND (page_len > 10000)  
[DBQuery] MessageCache::loadFromDB(en)-small [0s] localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket: SELECT  /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,actor_rev_user.actor_user AS `rev_user`,actor_rev_user.actor_name AS `rev_user_text`,temp_rev_user.revactor_actor AS `rev_actor`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len  FROM `page` JOIN `revision` ON ((page_id = rev_page)) JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) JOIN `revision_actor_temp` `temp_rev_user` ON ((temp_rev_user.revactor_rev = rev_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = temp_rev_user.revactor_actor))   WHERE page_is_redirect = 0 AND page_namespace = 8 AND (page_title NOT LIKE '%/%' ESCAPE '`' ) AND (page_len <= 10000) AND (page_latest = rev_id)  
[DBConnection] Wikimedia\Rdbms\LoadBalancer::getLocalConnection: reused a connection for local/0
[DBQuery] MediaWiki\Storage\NameTableStore::loadTable [0s] localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket: SELECT  role_id AS `id`,role_name AS `name`  FROM `slot_roles`     ORDER BY id 
[DBConnection] Wikimedia\Rdbms\LoadBalancer::getLocalConnection: reused a connection for local/0
[DBQuery] MediaWiki\Revision\RevisionStore::getSlotRowsForBatch [0s] localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket: SELECT  slot_revision_id,slot_content_id,slot_origin,slot_role_id,content_size,content_sha1,content_address,content_model  FROM `slots` JOIN `content` ON ((slot_content_id = content_id))   WHERE slot_revision_id = 15 AND slot_role_id = 1  
[DBConnection] Wikimedia\Rdbms\LoadBalancer::getLocalConnection: reused a connection for local/0
[DBQuery] MediaWiki\Storage\SqlBlobStore::fetchBlobs [0s] localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket: SELECT  old_id,old_text,old_flags  FROM `text`    WHERE old_id = 15  
[DBConnection] Wikimedia\Rdbms\LoadBalancer::getLocalConnection: reused a connection for local/0
[DBQuery] MediaWiki\Storage\NameTableStore::loadTable [0s] localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket: SELECT  model_id AS `id`,model_name AS `name`  FROM `content_models`     ORDER BY id 
[DBConnection] Wikimedia\Rdbms\LoadBalancer::getLocalConnection: reused a connection for local/0
[MessageCache] MessageCache::load: Loading en... local cache is empty, global cache is empty, loading from database
[DBConnection] Wikimedia\Rdbms\LoadBalancer::getLocalConnection: reused a connection for local/0
[DBQuery] Wikimedia\Rdbms\DatabaseMysqlBase::serverIsReadOnly [0s] localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket: SELECT @@GLOBAL.read_only AS Value
[session] SessionBackend "2ad45iaiec2q1v6430e3jlbarprvu1sp" force-persist due to persist()
[session] SessionBackend "2ad45iaiec2q1v6430e3jlbarprvu1sp" save: dataDirty=0 metaDirty=1 forcePersist=1
[session] Persisting session for unknown reason
[cookie] setcookie: "wikidb_session", "2ad45iaiec2q1v6430e3jlbarprvu1sp", "0", "/", "", "", "1", ""
[cookie] already deleted setcookie: "wikidbUserID", "", "1606303851", "/", "", "", "1", ""
[cookie] already deleted setcookie: "wikidbToken", "", "1606303851", "/", "", "", "1", ""
[cookie] already deleted setcookie: "forceHTTPS", "", "1606303851", "/", "", "", "1", ""
[session] SessionBackend "2ad45iaiec2q1v6430e3jlbarprvu1sp" Taking over PHP session
[session] SessionBackend "2ad45iaiec2q1v6430e3jlbarprvu1sp" save: dataDirty=0 metaDirty=1 forcePersist=1
[cookie] already set setcookie: "wikidb_session", "2ad45iaiec2q1v6430e3jlbarprvu1sp", "0", "/", "", "", "1", ""
[cookie] already deleted setcookie: "wikidbUserID", "", "1606303851", "/", "", "", "1", ""
[cookie] already deleted setcookie: "wikidbToken", "", "1606303851", "/", "", "", "1", ""
[cookie] already deleted setcookie: "forceHTTPS", "", "1606303851", "/", "", "", "1", ""
[session] SessionBackend "2ad45iaiec2q1v6430e3jlbarprvu1sp" data dirty due to dirty(): ApiMain->executeAction/ApiLogin->execute/MediaWiki\Session\Session->getToken/MediaWiki\Session\Session->set/MediaWiki\Session\SessionBackend->dirty
[DBConnection] Wikimedia\Rdbms\LoadBalancer::getLocalConnection: reused a connection for local/0
[DBConnection] Wikimedia\Rdbms\LoadBalancer::getLocalConnection: reused a connection for local/0
[DBQuery] MediaWiki\Revision\RevisionStore::fetchRevisionRowFromConds [0.001s] localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket: SELECT  rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,actor_rev_user.actor_user AS `rev_user`,actor_rev_user.actor_name AS `rev_user_text`,temp_rev_user.revactor_actor AS `rev_actor`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name  FROM `revision` JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) JOIN `revision_actor_temp` `temp_rev_user` ON ((temp_rev_user.revactor_rev = rev_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = temp_rev_user.revactor_actor)) JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((actor_rev_user.actor_user != 0) AND (user_id = actor_rev_user.actor_user))   WHERE page_namespace = 8 AND page_title = 'Gadgets-definition' AND (rev_id=page_latest)  LIMIT 1  
[authevents] Login attempt
[DBQuery] MediaWiki::preOutputCommit [0s] localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket: COMMIT
MediaWiki::preOutputCommit: primary transaction round committed
MediaWiki::preOutputCommit: pre-send deferred updates completed
[session] SessionBackend "2ad45iaiec2q1v6430e3jlbarprvu1sp" save: dataDirty=1 metaDirty=0 forcePersist=0
MediaWiki::preOutputCommit: session changes committed
[DBReplication] Wikimedia\Rdbms\ChronologyProtector::persistSessionReplicationPositions: no primary positions/timestamps to save
[DBReplication] Wikimedia\Rdbms\LBFactory::shutdown: finished ChronologyProtector shutdown
[DBReplication] LBFactory shutdown completed
User::getBlockedStatus: checking blocked status for 127.0.0.1
[DBConnection] Wikimedia\Rdbms\LoadBalancer::getLocalConnection: reused a connection for local/0
[DBConnection] Wikimedia\Rdbms\LoadBalancer::getLocalConnection: reused a connection for local/0
[DBQuery] Wikimedia\Rdbms\Database::beginIfImplied (MediaWiki\Block\DatabaseBlock::newLoad) [0s] localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket: BEGIN
[DBQuery] MediaWiki\Block\DatabaseBlock::newLoad [0.001s] localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket: SELECT  ipb_id,ipb_address,ipb_timestamp,ipb_auto,ipb_anon_only,ipb_create_account,ipb_enable_autoblock,ipb_expiry,ipb_deleted,ipb_block_email,ipb_allow_usertalk,ipb_parent_block_id,ipb_sitewide,ipb_by_actor,ipblocks_actor.actor_user AS `ipb_by`,ipblocks_actor.actor_name AS `ipb_by_text`,comment_ipb_reason.comment_text AS `ipb_reason_text`,comment_ipb_reason.comment_data AS `ipb_reason_data`,comment_ipb_reason.comment_id AS `ipb_reason_cid`  FROM `ipblocks` JOIN `actor` `ipblocks_actor` ON ((actor_id=ipb_by_actor)) JOIN `comment` `comment_ipb_reason` ON ((comment_ipb_reason.comment_id = ipb_reason_id))   WHERE ipb_address = '127.0.0.1' OR ((ipb_range_start  LIKE '7F00%' ESCAPE '`' ) AND (ipb_range_start <= '7F000001') AND (ipb_range_end >= '7F000001'))  
[DBQuery] MediaWiki::restInPeace [0s] localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket: COMMIT
[DBReplication] LBFactory shutdown completed
Request ended normally
[session] Saving all sessions on shutdown
[DBConnection] Wikimedia\Rdbms\LoadBalancer::__destruct: closing connection to database 'localhost:/workspace/db/quibble-mysql-wf0h3gbi/socket'.

Sounds like we hit a regression of some sort possibly introduced inside MediaWiki core.

The different of commits between the successful and bad builds:

$ colordiff -U0 <(grep commit good.txt|sort) <(grep commit bad.txt|sort)
--- /dev/fd/63	2021-11-26 11:13:06.433791564 +0100
+++ /dev/fd/62	2021-11-26 11:13:06.433791564 +0100
@@ -10 +10 @@
-INFO:zuul.Cloner.mediawiki/extensions/CodeEditor:Prepared mediawiki/extensions/CodeEditor repo with branch master at commit b10b8862a887089db4331229f0ee5188e8f11b32
+INFO:zuul.Cloner.mediawiki/extensions/CodeEditor:Prepared mediawiki/extensions/CodeEditor repo with branch master at commit ad1930d2555afcbaa5e9ba75fead28bb6859210b
@@ -12 +12 @@
-INFO:zuul.Cloner.mediawiki/extensions/ContentTranslation:Prepared mediawiki/extensions/ContentTranslation repo with branch master at commit 7a082144cf74fb543e4acdd52aa1ecf3d40dd31a
+INFO:zuul.Cloner.mediawiki/extensions/ContentTranslation:Prepared mediawiki/extensions/ContentTranslation repo with branch master at commit e66c9dcb7fe6f1a329f5493dff3af31b0fe80ddf
@@ -41 +41 @@
-INFO:zuul.Cloner.mediawiki/extensions/ProofreadPage:Prepared mediawiki/extensions/ProofreadPage repo with branch master at commit 903dd34aa673650396f3a999f00573a62dfd52b4
+INFO:zuul.Cloner.mediawiki/extensions/ProofreadPage:Prepared mediawiki/extensions/ProofreadPage repo with branch master at commit df2eed75ef8aa46609a874843cd5bfbe3d9747c7
@@ -50 +50 @@
-INFO:zuul.Cloner.mediawiki/extensions/UniversalLanguageSelector:Prepared mediawiki/extensions/UniversalLanguageSelector repo with branch master at commit 7805386fcbb6f4d3755e60c81a7d6e8ebf5b42a5
+INFO:zuul.Cloner.mediawiki/extensions/UniversalLanguageSelector:Prepared mediawiki/extensions/UniversalLanguageSelector repo with branch master at commit 80444e8be61a27c575481f5ef4c954f8642a16bb
@@ -55 +55 @@
-INFO:zuul.Cloner.mediawiki/extensions/WikiEditor:Prepared mediawiki/extensions/WikiEditor repo with branch master at commit f3c2a8a35ecdf23d26c1267e57dd168dd55e42af
+INFO:zuul.Cloner.mediawiki/extensions/WikiEditor:Prepared mediawiki/extensions/WikiEditor repo with branch master at commit 7fbd917c152ada0852a166ed97541472b874a963
@@ -60 +60 @@
-INFO:zuul.Cloner:Prepared mediawiki/core repo with branch master at commit e517052a137679f3cfffd3292d93ebc2acaa1e94
+INFO:zuul.Cloner:Prepared mediawiki/core repo with branch master at commit 73a25838b461a952914190cb48eaaefee11d7659

For mediawiki/core:

git log --oneline e517052a137679f3cfffd3292d93ebc2acaa1e94..73a25838b461a952914190cb48eaaefee11d7659
73a25838b46 Remove last User dependencies from File
992c592f7d5 Merge "Introduce setting PHP_INI env configs via a settings source"
dce6673430d Introduce setting PHP_INI env configs via a settings source

No idea how that could potentially be related. Guess we can look at the debug log of changes made to mediawiki/core and see whether one commit started magically triggering that spam of db queries. Or that might be due to hook code coming from one of the extensions.

The first debug line that differs is the bad debug log has a call to MediaWiki\User\ActorStore::getUserIdentityByName which would somehow trigger the load of every single message? https://gerrit.wikimedia.org/r/c/mediawiki/core/+/736105 might be a good candidate.

What puzzles me is that immediately after running update.php, Quibble invokes maintenance/rebuildLocalisationCache.php --lang=en. The good build has:

[localisation] LocalisationCache using store LCStoreStaticArray
[MessageCache] MessageCache using store APCUBagOStuff
[MessageCache] MessageCache::load: Loading en... local cache is empty, global cache is empty, loading from database

I am pretty sure we had those stored in the database so we could get them cached by running rebuildLocalisationCache. But that is a different issue probably.

That is all I have for today.

I am seeing errors in the beta logs of the form:

2021-11-26 11:42:20 [YaDIDYZogQBJCA7hy0XS4QAAAAU] deployment-mediawiki11 ukwiki 1.38.0-alpha exception ERROR: [YaDIDYZogQBJCA7hy0XS4QAAAAU] /   LogicException: Process cache for 'uk' should be set by now. {"exception_url":"/","reqId":"YaDIDYZogQBJCA7hy0XS4QAAAAU","caught_by":"entrypoint"} 
[Exception LogicException] (/srv/mediawiki/php-master/includes/cache/MessageCache.php:401) Process cache for 'uk' should be set by now.
  #0 /srv/mediawiki/php-master/includes/cache/MessageCache.php(1103): MessageCache->load(string)
  #1 /srv/mediawiki/php-master/includes/cache/MessageCache.php(1032): MessageCache->getMsgFromNamespace(string, string)
  #2 /srv/mediawiki/php-master/includes/cache/MessageCache.php(1002): MessageCache->getMessageForLang(Language, string, boolean, array)
  #3 /srv/mediawiki/php-master/includes/cache/MessageCache.php(944): MessageCache->getMessageFromFallbackChain(Language, string, boolean)
  #4 /srv/mediawiki/php-master/includes/language/Message.php(1455): MessageCache->get(string, boolean, Language)
  #5 /srv/mediawiki/php-master/includes/language/Message.php(945): Message->fetchMessage()
  #6 /srv/mediawiki/php-master/includes/language/Message.php(1035): Message->format(string)
  #7 /srv/mediawiki/php-master/includes/Title.php(732): Message->text()
  #8 /srv/mediawiki/php-master/includes/MediaWiki.php(141): Title::newMainPage()
  #9 /srv/mediawiki/php-master/includes/MediaWiki.php(161): MediaWiki->parseTitle()
  #10 /srv/mediawiki/php-master/includes/MediaWiki.php(870): MediaWiki->getTitle()
  #11 /srv/mediawiki/php-master/includes/MediaWiki.php(554): MediaWiki->main()
  #12 /srv/mediawiki/php-master/index.php(53): MediaWiki->run()
  #13 /srv/mediawiki/php-master/index.php(46): wfIndexMain()
  #14 /srv/mediawiki/w/index.php(3): require(string)
  #15 {main}

and:

2021-11-26 11:42:53 [YaDIMYZogQBJCA7hy0XTIgAAABQ] deployment-mediawiki11 enwikiversity 1.38.0-alpha exception ERROR: [YaDIMYZogQBJCA7hy0XTIgAAABQ] /wiki/Special:CentralAutoLogin/start?type=1x1&from=commonswiki   PHP Fatal Error from line 47 of /srv/mediawiki/php-master/includes/libs/rdbms/database/resultwrapper/MysqliResultWrapper.php: Allowed memory size of 698351616 bytes exhausted (tried to allocate 20480 bytes) {"exception_url":"/wiki/Special:CentralAutoLogin/start?type=1x1&from=commonswiki","reqId":"YaDIMYZogQBJCA7hy0XTIgAAABQ","caught_by":"mwe_handler"} 
[Exception ErrorException] (/srv/mediawiki/php-master/includes/libs/rdbms/database/resultwrapper/MysqliResultWrapper.php:47) PHP Fatal Error: Allowed memory size of 698351616 bytes exhausted (tried to allocate 20480 bytes)
  #0 [internal function]: MWExceptionHandler::handleFatalError()
  #1 {main}

Might they be related, or should I raise a separate bug(s)?

Change 742138 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] Rebuild localisation cache before Selenium tests begin

https://gerrit.wikimedia.org/r/742138

I am seeing errors in the beta logs of the form:

2021-11-26 11:42:20 [YaDIDYZogQBJCA7hy0XS4QAAAAU] deployment-mediawiki11 ukwiki 1.38.0-alpha exception ERROR: [YaDIDYZogQBJCA7hy0XS4QAAAAU] /   LogicException: Process cache for 'uk' should be set by now. {"exception_url":"/","reqId":"YaDIDYZogQBJCA7hy0XS4QAAAAU","caught_by":"entrypoint"} 
[Exception LogicException] (/srv/mediawiki/php-master/includes/cache/MessageCache.php:401) Process cache for 'uk' should be set by now.
  #0 /srv/mediawiki/php-master/includes/cache/MessageCache.php(1103): MessageCache->load(string)
  #1 /srv/mediawiki/php-master/includes/cache/MessageCache.php(1032): MessageCache->getMsgFromNamespace(string, string)
  #2 /srv/mediawiki/php-master/includes/cache/MessageCache.php(1002): MessageCache->getMessageForLang(Language, string, boolean, array)
  #3 /srv/mediawiki/php-master/includes/cache/MessageCache.php(944): MessageCache->getMessageFromFallbackChain(Language, string, boolean)
  #4 /srv/mediawiki/php-master/includes/language/Message.php(1455): MessageCache->get(string, boolean, Language)
  #5 /srv/mediawiki/php-master/includes/language/Message.php(945): Message->fetchMessage()
  #6 /srv/mediawiki/php-master/includes/language/Message.php(1035): Message->format(string)
  #7 /srv/mediawiki/php-master/includes/Title.php(732): Message->text()
  #8 /srv/mediawiki/php-master/includes/MediaWiki.php(141): Title::newMainPage()
  #9 /srv/mediawiki/php-master/includes/MediaWiki.php(161): MediaWiki->parseTitle()
  #10 /srv/mediawiki/php-master/includes/MediaWiki.php(870): MediaWiki->getTitle()
  #11 /srv/mediawiki/php-master/includes/MediaWiki.php(554): MediaWiki->main()
  #12 /srv/mediawiki/php-master/index.php(53): MediaWiki->run()
  #13 /srv/mediawiki/php-master/index.php(46): wfIndexMain()
  #14 /srv/mediawiki/w/index.php(3): require(string)
  #15 {main}

and:

2021-11-26 11:42:53 [YaDIMYZogQBJCA7hy0XTIgAAABQ] deployment-mediawiki11 enwikiversity 1.38.0-alpha exception ERROR: [YaDIMYZogQBJCA7hy0XTIgAAABQ] /wiki/Special:CentralAutoLogin/start?type=1x1&from=commonswiki   PHP Fatal Error from line 47 of /srv/mediawiki/php-master/includes/libs/rdbms/database/resultwrapper/MysqliResultWrapper.php: Allowed memory size of 698351616 bytes exhausted (tried to allocate 20480 bytes) {"exception_url":"/wiki/Special:CentralAutoLogin/start?type=1x1&from=commonswiki","reqId":"YaDIMYZogQBJCA7hy0XTIgAAABQ","caught_by":"mwe_handler"} 
[Exception ErrorException] (/srv/mediawiki/php-master/includes/libs/rdbms/database/resultwrapper/MysqliResultWrapper.php:47) PHP Fatal Error: Allowed memory size of 698351616 bytes exhausted (tried to allocate 20480 bytes)
  #0 [internal function]: MWExceptionHandler::handleFatalError()
  #1 {main}

Might they be related, or should I raise a separate bug(s)?

@dom_walden that seems like a separate issue

The first debug line that differs is the bad debug log has a call to MediaWiki\User\ActorStore::getUserIdentityByName which would somehow trigger the load of every single message? https://gerrit.wikimedia.org/r/c/mediawiki/core/+/736105 might be a good candidate.

What puzzles me is that immediately after running update.php, Quibble invokes maintenance/rebuildLocalisationCache.php --lang=en. The good build has:

[localisation] LocalisationCache using store LCStoreStaticArray
[MessageCache] MessageCache using store APCUBagOStuff
[MessageCache] MessageCache::load: Loading en... local cache is empty, global cache is empty, loading from database

I am pretty sure we had those stored in the database so we could get them cached by running rebuildLocalisationCache. But that is a different issue probably.

That is all I have for today.

I don't think the localisation cache is related.

The patch I've proposed does two things:

  • (a) stops injecting content into LocalSettings.php
  • (b) does not use maintenance/importDump
  • (c) as a side effect of (a), potentially does not make an HTTP request to image-suggestion-api.wmcloud.org

I'm not sure which of those three exactly are at fault, but I will look into that more next week.

I don't know if this is related, but the comments on the patch that broke my local dev environment point to this ticket here.

The backtrace I get looks like MessageCache and RevisionStore are calling each other in an endless loop.

Original exception: [8f5bf9e7e6dc4746f4a1bbb9] /index.php/Main_Page Error: Maximum function nesting level of '256' reached, aborting!
Backtrace:
from /var/www/mediawiki/core/includes/libs/objectcache/BagOStuff.php(842)
#0 /var/www/mediawiki/core/includes/libs/objectcache/BagOStuff.php(842): microtime()
#1 /var/www/mediawiki/core/includes/libs/objectcache/HashBagOStuff.php(150): BagOStuff->getCurrentTime()
#2 /var/www/mediawiki/core/includes/libs/objectcache/HashBagOStuff.php(73): HashBagOStuff->expire()
#3 /var/www/mediawiki/core/includes/libs/objectcache/MediumSpecificBagOStuff.php(122): HashBagOStuff->doGet()
#4 /var/www/mediawiki/core/includes/libs/objectcache/BagOStuff.php(199): MediumSpecificBagOStuff->get()
#5 /var/www/mediawiki/core/includes/libs/rdbms/loadbalancer/LoadBalancer.php(2261): BagOStuff->getWithSetCallback()
#6 /var/www/mediawiki/core/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1024): Wikimedia\Rdbms\LoadBalancer->isPrimaryConnectionReadOnly()
#7 /var/www/mediawiki/core/includes/libs/rdbms/loadbalancer/LoadBalancer.php(963): Wikimedia\Rdbms\LoadBalancer->getServerConnection()
#8 /var/www/mediawiki/core/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1128): Wikimedia\Rdbms\LoadBalancer->getConnection()
#9 /var/www/mediawiki/core/includes/objectcache/SqlBagOStuff.php(1663): Wikimedia\Rdbms\LoadBalancer->getMaintenanceConnectionRef()
#10 /var/www/mediawiki/core/includes/objectcache/SqlBagOStuff.php(437): SqlBagOStuff->getConnectionViaLoadBalancer()
#11 /var/www/mediawiki/core/includes/objectcache/SqlBagOStuff.php(523): SqlBagOStuff->getConnection()
#12 /var/www/mediawiki/core/includes/objectcache/SqlBagOStuff.php(220): SqlBagOStuff->fetchBlobs()
#13 /var/www/mediawiki/core/includes/libs/objectcache/MediumSpecificBagOStuff.php(122): SqlBagOStuff->doGet()
#14 /var/www/mediawiki/core/includes/cache/MessageCache.php(335): MediumSpecificBagOStuff->get()
#15 /var/www/mediawiki/core/includes/cache/MessageCache.php(1103): MessageCache->load()
#16 /var/www/mediawiki/core/includes/cache/MessageCache.php(1031): MessageCache->getMsgFromNamespace()
#17 /var/www/mediawiki/core/includes/cache/MessageCache.php(1002): MessageCache->getMessageForLang()
#18 /var/www/mediawiki/core/includes/cache/MessageCache.php(944): MessageCache->getMessageFromFallbackChain()
#19 /var/www/mediawiki/core/includes/language/Message.php(1455): MessageCache->get()
#20 /var/www/mediawiki/core/includes/language/Message.php(945): Message->fetchMessage()
#21 /var/www/mediawiki/core/includes/language/Message.php(934): Message->format()
#22 /var/www/mediawiki/core/includes/Message/TextFormatter.php(51): Message->toString()
#23 /var/www/mediawiki/core/includes/user/UserNameUtils.php(193): MediaWiki\Message\TextFormatter->format()
#24 /var/www/mediawiki/core/includes/user/ActorStore.php(661): MediaWiki\User\UserNameUtils->isUsable()
#25 /var/www/mediawiki/core/includes/user/ActorStore.php(432): MediaWiki\User\ActorStore->validateActorForInsertion()
#26 /var/www/mediawiki/core/includes/user/ActorStore.php(736): MediaWiki\User\ActorStore->acquireActorId()
#27 /var/www/mediawiki/core/includes/Revision/RevisionStore.php(1694): MediaWiki\User\ActorStore->getUnknownActor()
#28 /var/www/mediawiki/core/includes/Revision/RevisionStore.php(2028): MediaWiki\Revision\RevisionStore->newRevisionFromRowAndSlots()
#29 [internal function]: MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}()
#30 /var/www/mediawiki/core/includes/Revision/RevisionStore.php(1997): array_map()
#31 /var/www/mediawiki/core/includes/cache/MessageCache.php(565): MediaWiki\Revision\RevisionStore->newRevisionsFromBatch()
#32 /var/www/mediawiki/core/includes/cache/MessageCache.php(440): MessageCache->loadFromDB()
#33 /var/www/mediawiki/core/includes/cache/MessageCache.php(363): MessageCache->loadFromDBWithLock()
#34 /var/www/mediawiki/core/includes/cache/MessageCache.php(1103): MessageCache->load()
#35 /var/www/mediawiki/core/includes/cache/MessageCache.php(1031): MessageCache->getMsgFromNamespace()
#36 /var/www/mediawiki/core/includes/cache/MessageCache.php(1002): MessageCache->getMessageForLang()
#37 /var/www/mediawiki/core/includes/cache/MessageCache.php(944): MessageCache->getMessageFromFallbackChain()
#38 /var/www/mediawiki/core/includes/language/Message.php(1455): MessageCache->get()
#39 /var/www/mediawiki/core/includes/language/Message.php(945): Message->fetchMessage()
#40 /var/www/mediawiki/core/includes/language/Message.php(934): Message->format()
#41 /var/www/mediawiki/core/includes/Message/TextFormatter.php(51): Message->toString()
#42 /var/www/mediawiki/core/includes/user/UserNameUtils.php(193): MediaWiki\Message\TextFormatter->format()
#43 /var/www/mediawiki/core/includes/user/ActorStore.php(661): MediaWiki\User\UserNameUtils->isUsable()
#44 /var/www/mediawiki/core/includes/user/ActorStore.php(432): MediaWiki\User\ActorStore->validateActorForInsertion()
#45 /var/www/mediawiki/core/includes/user/ActorStore.php(736): MediaWiki\User\ActorStore->acquireActorId()
#46 /var/www/mediawiki/core/includes/Revision/RevisionStore.php(1694): MediaWiki\User\ActorStore->getUnknownActor()
#47 /var/www/mediawiki/core/includes/Revision/RevisionStore.php(2028): MediaWiki\Revision\RevisionStore->newRevisionFromRowAndSlots()
#48 [internal function]: MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}()
#49 /var/www/mediawiki/core/includes/Revision/RevisionStore.php(1997): array_map()
#50 /var/www/mediawiki/core/includes/cache/MessageCache.php(565): MediaWiki\Revision\RevisionStore->newRevisionsFromBatch()
#51 /var/www/mediawiki/core/includes/cache/MessageCache.php(440): MessageCache->loadFromDB()
#52 /var/www/mediawiki/core/includes/cache/MessageCache.php(363): MessageCache->loadFromDBWithLock()
#53 /var/www/mediawiki/core/includes/cache/MessageCache.php(1103): MessageCache->load()
#54 /var/www/mediawiki/core/includes/cache/MessageCache.php(1031): MessageCache->getMsgFromNamespace()
#55 /var/www/mediawiki/core/includes/cache/MessageCache.php(1002): MessageCache->getMessageForLang()
#56 /var/www/mediawiki/core/includes/cache/MessageCache.php(944): MessageCache->getMessageFromFallbackChain()
#57 /var/www/mediawiki/core/includes/language/Message.php(1455): MessageCache->get()
#58 /var/www/mediawiki/core/includes/language/Message.php(945): Message->fetchMessage()
#59 /var/www/mediawiki/core/includes/language/Message.php(934): Message->format()
#60 /var/www/mediawiki/core/includes/Message/TextFormatter.php(51): Message->toString()
#61 /var/www/mediawiki/core/includes/user/UserNameUtils.php(193): MediaWiki\Message\TextFormatter->format()
#62 /var/www/mediawiki/core/includes/user/ActorStore.php(661): MediaWiki\User\UserNameUtils->isUsable()
#63 /var/www/mediawiki/core/includes/user/ActorStore.php(432): MediaWiki\User\ActorStore->validateActorForInsertion()
[…]
#254 /var/www/mediawiki/core/index.php(53): MediaWiki->run()
#255 /var/www/mediawiki/core/index.php(46): wfIndexMain()
#256 {main}

As said on the patch I suggest to undo this change for the moment and try again later.

Original exception: [8f5bf9e7e6dc4746f4a1bbb9] /index.php/Main_Page Error: Maximum function nesting level of '256' reached, aborting!

That looks really related to this T296508 which does show an apparent infinite loop of message cache retrieval.

May be related, the beta cluster update.php apparently infinite loop which I have filed as T296539:

Thread 1 "php" received signal SIGSEGV, Segmentation fault.
0x000055555578b04c in ?? ()
(gdb) bt
#0  0x000055555578b04c in ?? ()
#1  0x000055555578c5bc in ?? ()
#2  0x000055555578c68b in ap_php_slprintf ()
#3  0x00005555556405aa in ?? ()
#4  0x0000555555641d1a in ?? ()
#5  0x00007fffe17107f5 in tideways_xhprof_execute_internal ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#6  0x000055555589f54d in execute_ex ()
#7  0x00007fffe1710239 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20170718/tideways_xhprof.so
#8  0x00005555558a0084 in execute_ex ()
#9  0x00007fffe1710239 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20170718/tideways_xhprof.so

Which looks like an infinite recursion as well.

I have marked it a train blocker for 1.38.0-wmf.11 (T293952).

Yup, reverting the patch mentioned by @thiemowmde will certainly fix this. I'll re-do the patch slightly differently later on, reverting is the right move now.

Change 742098 had a related patch set uploaded (by Majavah; author: Majavah):

[mediawiki/core@master] Revert \"Remove last User dependencies from File\"

https://gerrit.wikimedia.org/r/742098

Change 742138 abandoned by Kosta Harlan:

[mediawiki/extensions/GrowthExperiments@master] selenium: Move login actions to test suite, remove local settings code

Reason:

See I8af37665eeb284b85157a72459d43261ec4829ed

https://gerrit.wikimedia.org/r/742138

Change 742098 merged by jenkins-bot:

[mediawiki/core@master] Revert \"Remove last User dependencies from File\"

https://gerrit.wikimedia.org/r/742098

kostajh claimed this task.

Yup, reverting the patch mentioned by @thiemowmde will certainly fix this. I'll re-do the patch slightly differently later on, reverting is the right move now.

Thanks @Pchelolo, and to @dom_walden, @thiemowmde, @hashar and @zeljkofilipin for debugging.