Page MenuHomePhabricator

Fatal db error "Could not select database 'centralauth'" (sometimes also 'metawiki')
Closed, ResolvedPublic

Description

Error

reedy@mwlog1001:/srv/mw-log$ grep -c "Could not select database 'centralauth'" exception.log 
18
reedy@mwlog1001:/srv/mw-log$ grep -c "Could not select database 'centralauth'" archive/exception.log-20181219 
176
reedy@mwlog1001:/srv/mw-log$ zgrep -c "Could not select database 'centralauth'" archive/exception.log-2018121*.gz 
archive/exception.log-20181210.gz:16
archive/exception.log-20181211.gz:6
archive/exception.log-20181212.gz:8
archive/exception.log-20181213.gz:14
archive/exception.log-20181214.gz:16
archive/exception.log-20181215.gz:234
archive/exception.log-20181216.gz:38
archive/exception.log-20181217.gz:22
archive/exception.log-20181218.gz:174
2018-12-19 09:38:59 [f2062cb4a63e8d37d0641606] mw1308 hewiki 1.33.0-wmf.8 exception ERROR: [f2062cb4a63e8d37d0641606] /rpc/RunSingleJob.php   Wikimedia\Rdbms\DBExpectedError from line 196 of /srv/mediawiki/php-1.33.0-wmf.8/includes/libs/rdbms/database/DatabaseMysqli.php: Could not select database 'centralauth'. {"exception_id":"f2062cb4a63e8d37d0641606","exception_url":"/rpc/RunSingleJob.php","caught_by":"mwe_handler"} 
[Exception Wikimedia\Rdbms\DBExpectedError] (/srv/mediawiki/php-1.33.0-wmf.8/includes/libs/rdbms/database/DatabaseMysqli.php:196) Could not select database 'centralauth'.
  #0 /srv/mediawiki/php-1.33.0-wmf.8/includes/libs/rdbms/database/Database.php(2312): Wikimedia\Rdbms\DatabaseMysqli->doSelectDomain(Wikimedia\Rdbms\DatabaseDomain)
  #1 /srv/mediawiki/php-1.33.0-wmf.8/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1015): Wikimedia\Rdbms\Database->selectDomain(Wikimedia\Rdbms\DatabaseDomain)
  #2 /srv/mediawiki/php-1.33.0-wmf.8/includes/libs/rdbms/loadbalancer/LoadBalancer.php(871): Wikimedia\Rdbms\LoadBalancer->openForeignConnection(integer, string, integer)
  #3 /srv/mediawiki/php-1.33.0-wmf.8/includes/libs/rdbms/loadbalancer/LoadBalancer.php(750): Wikimedia\Rdbms\LoadBalancer->openConnection(integer, string, integer)
  #4 /srv/mediawiki/php-1.33.0-wmf.8/includes/libs/rdbms/loadbalancer/LoadBalancer.php(835): Wikimedia\Rdbms\LoadBalancer->getConnection(integer, array, string, integer)
  #5 /srv/mediawiki/php-1.33.0-wmf.8/extensions/CentralAuth/includes/CentralAuthUtils.php(70): Wikimedia\Rdbms\LoadBalancer->getConnectionRef(integer, string, string)
  #6 /srv/mediawiki/php-1.33.0-wmf.8/extensions/CentralAuth/includes/CentralAuthUser.php(504): CentralAuthUtils::getCentralSlaveDB()
  #7 /srv/mediawiki/php-1.33.0-wmf.8/includes/libs/objectcache/WANObjectCache.php(1128): Closure$CentralAuthUser::loadFromCache(boolean, integer, array, NULL)
  #8 /srv/mediawiki/php-1.33.0-wmf.8/includes/libs/objectcache/WANObjectCache.php(1277): Closure$WANObjectCache::getWithSetCallback(boolean, integer, array, NULL)
  #9 /srv/mediawiki/php-1.33.0-wmf.8/includes/libs/objectcache/WANObjectCache.php(1134): WANObjectCache->doGetWithSetCallback(string, integer, Closure$WANObjectCache::getWithSetCallback;316, array, NULL)
  #10 /srv/mediawiki/php-1.33.0-wmf.8/extensions/CentralAuth/includes/CentralAuthUser.php(519): WANObjectCache->getWithSetCallback(string, integer, Closure$CentralAuthUser::loadFromCache;901, array)
  #11 /srv/mediawiki/php-1.33.0-wmf.8/extensions/CentralAuth/includes/CentralAuthUser.php(374): CentralAuthUser->loadFromCache()
  #12 /srv/mediawiki/php-1.33.0-wmf.8/extensions/CentralAuth/includes/CentralAuthUser.php(549): CentralAuthUser->loadState()
  #13 /srv/mediawiki/php-1.33.0-wmf.8/extensions/CentralAuth/includes/CentralAuthIdLookup.php(84): CentralAuthUser->getId()
  #14 /srv/mediawiki/php-1.33.0-wmf.8/extensions/GlobalUserPage/includes/GlobalUserPage.php(171): CentralAuthIdLookup->isAttached(User)
  #15 /srv/mediawiki/php-1.33.0-wmf.8/extensions/GlobalUserPage/includes/Hooks.php(57): MediaWiki\GlobalUserPage\GlobalUserPage::shouldDisplayGlobalPage(Title)
  #16 /srv/mediawiki/php-1.33.0-wmf.8/includes/Hooks.php(174): MediaWiki\GlobalUserPage\Hooks::onTitleIsAlwaysKnown(Title, NULL)
  #17 /srv/mediawiki/php-1.33.0-wmf.8/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)
  #18 /srv/mediawiki/php-1.33.0-wmf.8/includes/Title.php(4759): Hooks::run(string, array)
  #19 /srv/mediawiki/php-1.33.0-wmf.8/includes/parser/Parser.php(2472): Title->isAlwaysKnown()
  #20 /srv/mediawiki/php-1.33.0-wmf.8/includes/parser/Parser.php(2171): Parser->replaceInternalLinks2(string)
  #21 /srv/mediawiki/php-1.33.0-wmf.8/includes/parser/Parser.php(1379): Parser->replaceInternalLinks(string)
  #22 /srv/mediawiki/php-1.33.0-wmf.8/includes/parser/Parser.php(482): Parser->internalParse(string)
  #23 /srv/mediawiki/php-1.33.0-wmf.8/includes/content/WikitextContent.php(369): Parser->parse(string, Title, ParserOptions, boolean, boolean, integer)
  #24 /srv/mediawiki/php-1.33.0-wmf.8/includes/content/AbstractContent.php(517): WikitextContent->fillParserOutput(Title, integer, ParserOptions, boolean, ParserOutput)
  #25 /srv/mediawiki/php-1.33.0-wmf.8/includes/Revision/RenderedRevision.php(266): AbstractContent->getParserOutput(Title, integer, ParserOptions, boolean)
  #26 /srv/mediawiki/php-1.33.0-wmf.8/includes/Revision/RenderedRevision.php(234): MediaWiki\Revision\RenderedRevision->getSlotParserOutputUncached(WikitextContent, boolean)
  #27 /srv/mediawiki/php-1.33.0-wmf.8/includes/Revision/RevisionRenderer.php(193): MediaWiki\Revision\RenderedRevision->getSlotParserOutput(string)
  #28 /srv/mediawiki/php-1.33.0-wmf.8/includes/Revision/RevisionRenderer.php(142): MediaWiki\Revision\RevisionRenderer->combineSlotOutput(MediaWiki\Revision\RenderedRevision, array)
  #29 [internal function]: Closure$MediaWiki\Revision\RevisionRenderer::getRenderedRevision#2(MediaWiki\Revision\RenderedRevision, array)
  #30 /srv/mediawiki/php-1.33.0-wmf.8/includes/Revision/RenderedRevision.php(197): call_user_func(Closure$MediaWiki\Revision\RevisionRenderer::getRenderedRevision#2;879, MediaWiki\Revision\RenderedRevision, array)
  #31 /srv/mediawiki/php-1.33.0-wmf.8/includes/jobqueue/jobs/CategoryMembershipChangeJob.php(273): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput()
  #32 /srv/mediawiki/php-1.33.0-wmf.8/includes/jobqueue/jobs/CategoryMembershipChangeJob.php(249): CategoryMembershipChangeJob->getCategoriesAtRev(WikiPage, Revision, string)
  #33 /srv/mediawiki/php-1.33.0-wmf.8/includes/jobqueue/jobs/CategoryMembershipChangeJob.php(206): CategoryMembershipChangeJob->getExplicitCategoriesChanges(WikiPage, Revision, Revision)
  #34 /srv/mediawiki/php-1.33.0-wmf.8/includes/jobqueue/jobs/CategoryMembershipChangeJob.php(172): CategoryMembershipChangeJob->notifyUpdatesForRevision(Wikimedia\Rdbms\LBFactoryMulti, WikiPage, Revision)
  #35 /srv/mediawiki/php-1.33.0-wmf.8/extensions/EventBus/includes/JobExecutor.php(65): CategoryMembershipChangeJob->run()
  #36 /srv/mediawiki/rpc/RunSingleJob.php(77): JobExecutor->execute(array)
  #37 {main}

Impact

Known to impact a number of different scenarios. Basically whenever a connection to metawiki or centralauth is used from a local wiki:

  • Regular page views (SkinTemplate => GlobalUserPage::onTitleIsAlwaysKnown) per T212284#5125669
  • Commons file uploads (SkinTemplate => GlobalUserPage::onTitleIsAlwaysKnown) per T220693.
  • Job runners (CategoryMembershipChangeJob and RefreshLinks; via Parser=>GlobalUserPageHooks::onTitleIsAlwaysKnown=>CentralAuth) per above and T212284#5125669.
  • Regular page views [post-send] (SessionManager::shutdown => … => CentralAuthUser::isAttached) per T212284#4910831

Event Timeline

Reedy created this task.Dec 19 2018, 10:12 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 19 2018, 10:12 AM
Zoranzoki21 added a subscriber: Zoranzoki21.EditedDec 19 2018, 8:24 PM

He can`t select database because there is not centralauth database:

MariaDB [hewiki_p]> SHOW TABLES;
+-------------------------+
| Tables_in_hewiki_p      |
+-------------------------+
| abuse_filter            |
| abuse_filter_action     |
| abuse_filter_log        |
| actor                   |
| archive                 |
| archive_compat          |
| archive_userindex       |
| babel                   |
| category                |
| categorylinks           |
| change_tag              |
| change_tag_def          |
| comment                 |
| comment_mat             |
| content                 |
| content_models          |
| ep_articles             |
| ep_cas                  |
| ep_courses              |
| ep_events               |
| ep_instructors          |
| ep_oas                  |
| ep_orgs                 |
| ep_revisions            |
| ep_students             |
| ep_users_per_course     |
| externallinks           |
| filearchive             |
| filearchive_compat      |
| filearchive_userindex   |
| geo_tags                |
| global_block_whitelist  |
| image                   |
| image_compat            |
| imagelinks              |
| interwiki               |
| ip_changes              |
| ipblocks                |
| ipblocks_compat         |
| ipblocks_ipindex        |
| iwlinks                 |
| l10n_cache              |
| langlinks               |
| linter                  |
| logging                 |
| logging_compat          |
| logging_logindex        |
| logging_userindex       |
| math                    |
| module_deps             |
| msg_resource_links      |
| oldimage                |
| oldimage_compat         |
| oldimage_userindex      |
| ores_classification     |
| ores_model              |
| page                    |
| page_compat             |
| page_props              |
| page_restrictions       |
| pagelinks               |
| pif_edits               |
| protected_titles        |
| protected_titles_compat |
| recentchanges           |
| recentchanges_compat    |
| recentchanges_userindex |
| redirect                |
| revision                |
| revision_compat         |
| revision_userindex      |
| site_identifiers        |
| site_stats              |
| sites                   |
| slot_roles              |
| slots                   |
| tag_summary             |
| templatelinks           |
| transcode               |
| updatelog               |
| user                    |
| user_former_groups      |
| user_groups             |
| user_properties         |
| user_properties_anon    |
| valid_tag               |
| wbc_entity_usage        |
| wikilove_log            |
+-------------------------+
88 rows in set (0.00 sec)

Too and at database of metawiki:

MariaDB [metawiki_p]> SHOW TABLES;
+---------------------------+
| Tables_in_metawiki_p      |
+---------------------------+
| abuse_filter              |
| abuse_filter_action       |
| abuse_filter_log          |
| actor                     |
| archive                   |
| archive_compat            |
| archive_userindex         |
| babel                     |
| category                  |
| categorylinks             |
| change_tag                |
| change_tag_def            |
| cn_assignments            |
| cn_known_devices          |
| cn_known_mobile_carriers  |
| cn_notice_countries       |
| cn_notice_languages       |
| cn_notice_log             |
| cn_notice_mixin_params    |
| cn_notice_mixins          |
| cn_notice_mobile_carriers |
| cn_notice_projects        |
| cn_notices                |
| cn_template_devices       |
| cn_template_log           |
| cn_template_mixins        |
| cn_templates              |
| comment                   |
| comment_mat               |
| content                   |
| content_models            |
| externallinks             |
| filearchive               |
| filearchive_compat        |
| filearchive_userindex     |
| flaggedimages             |
| flaggedpage_config        |
| flaggedpage_pending       |
| flaggedpages              |
| flaggedrevs               |
| flaggedrevs_promote       |
| flaggedrevs_statistics    |
| flaggedrevs_tracking      |
| flaggedtemplates          |
| geo_tags                  |
| global_block_whitelist    |
| image                     |
| image_compat              |
| imagelinks                |
| interwiki                 |
| ip_changes                |
| ipblocks                  |
| ipblocks_compat           |
| ipblocks_ipindex          |
| iwlinks                   |
| l10n_cache                |
| langlinks                 |
| linter                    |
| logging                   |
| logging_compat            |
| logging_logindex          |
| logging_userindex         |
| math                      |
| module_deps               |
| msg_resource_links        |
| oldimage                  |
| oldimage_compat           |
| oldimage_userindex        |
| page                      |
| page_compat               |
| page_props                |
| page_restrictions         |
| pagelinks                 |
| pif_edits                 |
| protected_titles          |
| protected_titles_compat   |
| recentchanges             |
| recentchanges_compat      |
| recentchanges_userindex   |
| redirect                  |
| revision                  |
| revision_compat           |
| revision_userindex        |
| site_identifiers          |
| site_stats                |
| sites                     |
| slot_roles                |
| slots                     |
| tag_summary               |
| templatelinks             |
| transcode                 |
| updatelog                 |
| user                      |
| user_former_groups        |
| user_groups               |
| user_properties           |
| user_properties_anon      |
| valid_tag                 |
| wbc_entity_usage          |
+---------------------------+
99 rows in set (0.00 sec)

MariaDB [metawiki_p]>

He can`t select database because there is not centralauth database:

MariaDB [hewiki_p]> SHOW TABLES; ..

This error is about MediaWiki being able to select a database on a server, not a table. A database server has one or more databases (e.g. enwiki, hewiki, centralauth). Each database has one or more tables (e.g. page, revision, echo, ..).

Krinkle renamed this task from /rpc/RunSingleJob.php triggering "Could not select database 'centralauth'" to Fatal db error "Could not select database 'centralauth'" (sometimes also 'metawiki').Jan 26 2019, 7:48 AM
Krinkle added subscribers: KaMan, Aklapper, Lea_Lacroix_WMDE.

Still seen. Recent sample:

  • Request ID: XEupaApAAEUAABoAPtEAAABK
  • Request URL: http://nl.wikipedia.org/w/index.php?title=Overleg%3ALijsten_van_geboren_en_overleden_personen%2FTe_bewerken_personen&type=revision&diff=52797176&oldid=52797063
Wikimedia\Rdbms\DBExpectedError: Could not select database 'centralauth'.


#0 /srv/mediawiki/php-1.33.0-wmf.14/includes/libs/rdbms/database/Database.php(2312): Wikimedia\Rdbms\DatabaseMysqli->doSelectDomain(Wikimedia\Rdbms\DatabaseDomain)
#1 /srv/mediawiki/php-1.33.0-wmf.14/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1015): Wikimedia\Rdbms\Database->selectDomain(Wikimedia\Rdbms\DatabaseDomain)
#2 /srv/mediawiki/php-1.33.0-wmf.14/includes/libs/rdbms/loadbalancer/LoadBalancer.php(871): Wikimedia\Rdbms\LoadBalancer->openForeignConnection(integer, string, integer)
#3 /srv/mediawiki/php-1.33.0-wmf.14/includes/libs/rdbms/loadbalancer/LoadBalancer.php(750): Wikimedia\Rdbms\LoadBalancer->openConnection(integer, string, integer)
#4 /srv/mediawiki/php-1.33.0-wmf.14/includes/libs/rdbms/loadbalancer/LoadBalancer.php(835): Wikimedia\Rdbms\LoadBalancer->getConnection(integer, array, string, integer)
#5 /srv/mediawiki/php-1.33.0-wmf.14/extensions/CentralAuth/includes/CentralAuthUtils.php(70): Wikimedia\Rdbms\LoadBalancer->getConnectionRef(integer, string, string)
#6 /srv/mediawiki/php-1.33.0-wmf.14/extensions/CentralAuth/includes/CentralAuthUser.php(225): CentralAuthUtils::getCentralSlaveDB()
#7 /srv/mediawiki/php-1.33.0-wmf.14/extensions/CentralAuth/includes/CentralAuthUser.php(2241): CentralAuthUser->getSafeReadDB()
#8 /srv/mediawiki/php-1.33.0-wmf.14/extensions/CentralAuth/includes/CentralAuthUser.php(538): CentralAuthUser->loadAttached()
#9 /srv/mediawiki/php-1.33.0-wmf.14/extensions/CentralAuth/includes/CentralAuthUser.php(521): CentralAuthUser->loadFromCacheObject(array)
#10 /srv/mediawiki/php-1.33.0-wmf.14/extensions/CentralAuth/includes/CentralAuthUser.php(374): CentralAuthUser->loadFromCache()
#11 /srv/mediawiki/php-1.33.0-wmf.14/extensions/CentralAuth/includes/CentralAuthUser.php(597): CentralAuthUser->loadState()
#12 /srv/mediawiki/php-1.33.0-wmf.14/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php(262): CentralAuthUser->isAttached()
#13 /srv/mediawiki/php-1.33.0-wmf.14/includes/session/CookieSessionProvider.php(199): CentralAuthSessionProvider->sessionDataToExport(User)
#14 /srv/mediawiki/php-1.33.0-wmf.14/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php(282): MediaWiki\Session\CookieSessionProvider->persistSession(MediaWiki\Session\SessionBackend, WebRequest)
#15 /srv/mediawiki/php-1.33.0-wmf.14/includes/session/SessionBackend.php(687): CentralAuthSessionProvider->persistSession(MediaWiki\Session\SessionBackend, WebRequest)
#16 /srv/mediawiki/php-1.33.0-wmf.14/includes/session/SessionBackend.php(607): MediaWiki\Session\SessionBackend->save()
#17 /srv/mediawiki/php-1.33.0-wmf.14/includes/session/SessionBackend.php(581): MediaWiki\Session\SessionBackend->autosave()
#18 /srv/mediawiki/php-1.33.0-wmf.14/includes/session/SessionBackend.php(293): MediaWiki\Session\SessionBackend->renew()
#19 /srv/mediawiki/php-1.33.0-wmf.14/includes/session/Session.php(127): MediaWiki\Session\SessionBackend->persist()
#20 /srv/mediawiki/php-1.33.0-wmf.14/includes/session/PHPSessionHandler.php(330): MediaWiki\Session\Session->persist()
#21 [internal function]: MediaWiki\Session\PHPSessionHandler->write(string, string)
#22 /srv/mediawiki/php-1.33.0-wmf.14/includes/session/SessionManager.php(470): session_write_close()
#23 [internal function]: MediaWiki\Session\SessionManager->shutdown()
#24 {main}
Krinkle moved this task from Untriaged to General library on the Wikimedia-Rdbms board.

(Appears not JobQueue related after all, page views affected as well, per the above)

Change 498282 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] [WIP] rdbms: use a direct "USE" query for doSelectDomain() for mysql

https://gerrit.wikimedia.org/r/498282

kchapman assigned this task to aaron.Mar 25 2019, 8:59 PM
kchapman moved this task from Inbox to Doing on the Performance-Team board.

Change 498282 merged by jenkins-bot:
[mediawiki/core@master] rdbms: use a direct "USE" query for doSelectDomain() for mysql

https://gerrit.wikimedia.org/r/498282

The above patch made the error message different. It can now be found by searching for DatabaseMysqlBase::doSelectDomain or for ("USE centralauth" OR "USE metawiki").

Results from last 7 days for +type:mediawiki -level:("INFO" OR "DEBUG") +message:("USE centralauth" OR "USE metawiki").

  • 212 error events.
  • By channel:
    • 118 (DBQuery),
    • 84 (exception),
    • 9 (error),
    • 1 (JobExecutor).
  • By domain:
    • en.wikipedia.org: 86
    • www.wikidata.org: 48
    • jobrunner.discovery.wmnet: 25
    • commons.wikimedia.org: 9
    • pt.wikipedia.org: 9
  • By selected dbname: Roughly even between centralauth and metawiki.

Some samples:

message from JobExecutor channel
Exception executing job: refreshLinks … DBQueryError: A connection error occurred during a query. 

  Query: USE centralauth
  Function: Wikimedia\Rdbms\DatabaseMysqlBase::doSelectDomain
  Error: 2006 MySQL server has gone away

#3 /srv/mediawiki/php-1.33.0-wmf.25/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1033): Wikimedia\Rdbms\Database->selectDomain(Wikimedia\Rdbms\DatabaseDomain)
#4 /srv/mediawiki/php-1.33.0-wmf.25/includes/libs/rdbms/loadbalancer/LoadBalancer.php(889): Wikimedia\Rdbms\LoadBalancer->openForeignConnection(integer, string, integer)
#5 /srv/mediawiki/php-1.33.0-wmf.25/includes/libs/rdbms/loadbalancer/LoadBalancer.php(755): Wikimedia\Rdbms\LoadBalancer->openConnection(integer, string, integer)
#6 /srv/mediawiki/php-1.33.0-wmf.25/includes/libs/rdbms/loadbalancer/LoadBalancer.php(841): Wikimedia\Rdbms\LoadBalancer->getConnection(integer, array, string, integer)
#7 /srv/mediawiki/php-1.33.0-wmf.25/extensions/CentralAuth/includes/CentralAuthUtils.php(72): Wikimedia\Rdbms\LoadBalancer->getConnectionRef(integer, string, string)
#8 /srv/mediawiki/php-1.33.0-wmf.25/extensions/CentralAuth/includes/CentralAuthUser.php(543): CentralAuthUtils::getCentralReplicaDB()
…
message from exception channel (mostly from /w/index.php or /w/api.php)
DBQueryError … A connection error occurred during a query. 
  Query: USE `metawiki`
  Function: Wikimedia\Rdbms\DatabaseMysqlBase::doSelectDomain
  Error: 2006 MySQL server has gone away (10.64.0.91)

#3 /srv/mediawiki/php-1.34.0-wmf.1/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1033): Wikimedia\Rdbms\Database->selectDomain(Wikimedia\Rdbms\DatabaseDomain)
#4 /srv/mediawiki/php-1.34.0-wmf.1/includes/libs/rdbms/loadbalancer/LoadBalancer.php(889): Wikimedia\Rdbms\LoadBalancer->openForeignConnection(integer, string, integer)
#5 /srv/mediawiki/php-1.34.0-wmf.1/includes/libs/rdbms/loadbalancer/LoadBalancer.php(755): Wikimedia\Rdbms\LoadBalancer->openConnection(integer, string, integer)
#6 /srv/mediawiki/php-1.34.0-wmf.1/includes/libs/rdbms/loadbalancer/LoadBalancer.php(841): Wikimedia\Rdbms\LoadBalancer->getConnection(integer, array, string, integer)
#7 /srv/mediawiki/php-1.34.0-wmf.1/extensions/GlobalUserPage/includes/GlobalUserPage.php(200): Wikimedia\Rdbms\LoadBalancer->getConnectionRef(integer, array, string)
#8 /srv/mediawiki/php-1.34.0-wmf.1/extensions/GlobalUserPage/includes/GlobalUserPage.php(175): MediaWiki\GlobalUserPage\GlobalUserPage::getCentralTouched(User)
#9 /srv/mediawiki/php-1.34.0-wmf.1/extensions/GlobalUserPage/includes/Hooks.php(57): MediaWiki\GlobalUserPage\GlobalUserPage::shouldDisplayGlobalPage(Title)
#10 /srv/mediawiki/php-1.34.0-wmf.1/includes/Hooks.php(174): MediaWiki\GlobalUserPage\Hooks::onTitleIsAlwaysKnown(Title, NULL)
…
Krinkle triaged this task as Normal priority.Apr 19 2019, 5:04 PM
kchapman added a subscriber: kchapman.

Moving to CPT Inbox to determine when could be worked on

daniel added a subscriber: daniel.May 2 2019, 2:33 PM

"Error: 2006 MySQL server has gone away (10.64.0.91)" indicates a TCP connection dying. This points to a network issue, or a (load realed?) timeout causing a TCP connect attempt to fail. But to https://dev.mysql.com/doc/refman/8.0/en/gone-away.html, there's a whole lot of other possible causes, some of them rather surprising...

Pinging DBA's to have a look.

daniel added a subscriber: Anomie.May 2 2019, 2:34 PM

hm, maybe @Anomie as an idea what might be causing this.

Anomie added a comment.May 2 2019, 4:33 PM

The new errors in T212284#5125669 seem to be related to the database connection having been closed or dropped (e.g. it's reusing a connection that was dropped due to a timeout). Since it's using ->doQuery() to try to change the database, it doesn't have automatic reconnection logic that ->query() has.

The cause of the original "Could not select database" messages was probably the same thing. Second option would be that the code was somehow winding up using a Database handle that doesn't actually connect to a server matching the requested "domain".

Side rant: The whole "domain" thing seems overcomplicated to me. That probably stems from the fact that MySQL's "database" concept and PostgreSQL's "database" concept are not the same thing despite having the same name. PG's "database" is more like MySQL's "server", while MySQL's "database" is more like PG's "schema". If we were to rework things around that equivalence we'd probably end up with something that makes a fair bit more sense.

The new errors in T212284#5125669 seem to be related to the database connection having been closed or dropped (e.g. it's reusing a connection that was dropped due to a timeout).

If a previous query on that later-reused connection timed out, would that not have caused an exception and/or be successfully retried? If that's the case, presumably we need to react to that in the code by removing it from the pool in that case. Maybe something is catching the exception (so not fatal), but then ending up re-used still.

If the last query was fine, but the connection has silently been dropped since then (due to not being used for a while), then it seems like the selection logic either needs to use query() or cherry-pick its reconnect logic on top of its use of doQuery().

Anomie added a comment.May 8 2019, 3:15 PM

If the last query was fine, but the connection has silently been dropped since then (due to not being used for a while), then it seems like the selection logic either needs to use query() or cherry-pick its reconnect logic on top of its use of doQuery().

^ That's what I was trying to say, yes.

Gilles removed aaron as the assignee of this task.May 20 2019, 8:02 PM
Gilles added a subscriber: Gilles.

Is CPT going to look into this? It seems more in the CPT team scope than performance's.

Gilles moved this task from Inbox to Radar on the Performance-Team board.May 20 2019, 8:03 PM
Gilles edited projects, added Performance-Team (Radar); removed Performance-Team.
daniel added a subscriber: tstarling.

Flagging for re-triage in the CPT team. I personally don't know where to start with investigating this, but perhaps @Anomie or @tstarling would know. The performance team has a lot of log wrangling expertise, so perhaps we could team up for the investigation.

Change 512043 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] rdbms: add Database::queryInternal method for sharing reconnect logic

https://gerrit.wikimedia.org/r/512043

Krinkle updated the task description. (Show Details)May 29 2019, 11:57 PM

Change 512043 merged by jenkins-bot:
[mediawiki/core@master] rdbms: add Database::executeQuery() method for internal use

https://gerrit.wikimedia.org/r/512043

Change 516720 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/core@master] Database: Recognize USE queries as non-write queries

https://gerrit.wikimedia.org/r/516720

Potentially resolved now that we have reconnects on USE queries (per https://gerrit.wikimedia.org/r/512043).

Change 516720 merged by jenkins-bot:
[mediawiki/core@master] Database: Recognize USE queries as non-write queries

https://gerrit.wikimedia.org/r/516720

Potentially resolved now that we have reconnects on USE queries (per https://gerrit.wikimedia.org/r/512043).

To research:

2019-06-17T14:23:3610.64.48.153 Wikimedia\Rdbms\DatabaseMysqlBase::doSelectDomain MySQL server has gone away (10.64.48.153) USE `centralauth`

Potentially resolved now that we have reconnects on USE queries (per https://gerrit.wikimedia.org/r/512043).

So, is it?

(Note I'm using the CPT "EM Sign Off" column as more of a "check if this is still valid" column)

aaron closed this task as Resolved.Jul 23 2019, 4:42 PM
aaron claimed this task.

The logs for doSelectDomain() look quite for the last 7 days.

mmodell changed the subtype of this task from "Task" to "Production Error".Wed, Aug 28, 11:08 PM