Page MenuHomePhabricator

SqlBlobStore no longer caching blobs (DBConnectionError Too many connections)
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
  • mwversion: 1.37.0-wmf.1
  • reqId: 93b8ccbd-f566-4569-94f4-7aa3266003cb
normalized_message
[{reqId}] {exception_url}   Wikimedia\Rdbms\DBConnectionError: Cannot access the database: Too many connections (10.64.32.102) (10.64.32.102)
exception.trace
from /srv/mediawiki/php-1.37.0-wmf.1/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1510)
#0 /srv/mediawiki/php-1.37.0-wmf.1/includes/libs/rdbms/loadbalancer/LoadBalancer.php(997): Wikimedia\Rdbms\LoadBalancer->reportConnectionError()
#1 /srv/mediawiki/php-1.37.0-wmf.1/includes/libs/rdbms/loadbalancer/LoadBalancer.php(962): Wikimedia\Rdbms\LoadBalancer->getServerConnection(integer, string, integer)
#2 /srv/mediawiki/php-1.37.0-wmf.1/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1101): Wikimedia\Rdbms\LoadBalancer->getConnection(integer, array, string, integer)
#3 /srv/mediawiki/php-1.37.0-wmf.1/includes/externalstore/ExternalStoreDB.php(168): Wikimedia\Rdbms\LoadBalancer->getConnectionRef(integer, array, string, integer)
#4 /srv/mediawiki/php-1.37.0-wmf.1/includes/externalstore/ExternalStoreDB.php(312): ExternalStoreDB->getReplica(string)
#5 /srv/mediawiki/php-1.37.0-wmf.1/includes/externalstore/ExternalStoreDB.php(66): ExternalStoreDB->fetchBlob(string, string, boolean)
#6 /srv/mediawiki/php-1.37.0-wmf.1/includes/externalstore/ExternalStoreAccess.php(52): ExternalStoreDB->fetchFromURL(string)
#7 /srv/mediawiki/php-1.37.0-wmf.1/includes/Storage/SqlBlobStore.php(509): ExternalStoreAccess->fetchFromURL(string, array)
#8 /srv/mediawiki/php-1.37.0-wmf.1/includes/libs/objectcache/wancache/WANObjectCache.php(1714): MediaWiki\Storage\SqlBlobStore->MediaWiki\Storage\{closure}(boolean, integer, array, NULL, array)
#9 /srv/mediawiki/php-1.37.0-wmf.1/includes/libs/objectcache/wancache/WANObjectCache.php(1542): WANObjectCache->fetchOrRegenerate(string, integer, Closure, array, array)
#10 /srv/mediawiki/php-1.37.0-wmf.1/includes/Storage/SqlBlobStore.php(513): WANObjectCache->getWithSetCallback(string, integer, Closure, array)
#11 /srv/mediawiki/php-1.37.0-wmf.1/includes/Storage/SqlBlobStore.php(430): MediaWiki\Storage\SqlBlobStore->expandBlob(string, array, string)
#12 /srv/mediawiki/php-1.37.0-wmf.1/includes/Storage/SqlBlobStore.php(286): MediaWiki\Storage\SqlBlobStore->fetchBlobs(array, integer)
#13 /srv/mediawiki/php-1.37.0-wmf.1/includes/libs/objectcache/wancache/WANObjectCache.php(1714): MediaWiki\Storage\SqlBlobStore->MediaWiki\Storage\{closure}(boolean, integer, array, NULL, array)
#14 /srv/mediawiki/php-1.37.0-wmf.1/includes/libs/objectcache/wancache/WANObjectCache.php(1542): WANObjectCache->fetchOrRegenerate(string, integer, Closure, array, array)
#15 /srv/mediawiki/php-1.37.0-wmf.1/includes/Storage/SqlBlobStore.php(291): WANObjectCache->getWithSetCallback(string, integer, Closure, array)
#16 /srv/mediawiki/php-1.37.0-wmf.1/includes/Revision/RevisionStore.php(1191): MediaWiki\Storage\SqlBlobStore->getBlob(string, integer)
#17 /srv/mediawiki/php-1.37.0-wmf.1/includes/Revision/RevisionStore.php(1463): MediaWiki\Revision\RevisionStore->loadSlotContent(MediaWiki\Revision\SlotRecord, NULL, NULL, NULL, integer)
#18 [internal function]: MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}(MediaWiki\Revision\SlotRecord)
#19 /srv/mediawiki/php-1.37.0-wmf.1/includes/Revision/SlotRecord.php(324): call_user_func(Closure, MediaWiki\Revision\SlotRecord)
#20 /srv/mediawiki/php-1.37.0-wmf.1/includes/Revision/RevisionRecord.php(164): MediaWiki\Revision\SlotRecord->getContent()
#21 /srv/mediawiki/php-1.37.0-wmf.1/includes/parser/Parser.php(3697): MediaWiki\Revision\RevisionRecord->getContent(string)
#22 /srv/mediawiki/php-1.37.0-wmf.1/includes/parser/Parser.php(3547): Parser->statelessFetchTemplate(Title, Parser)
#23 /srv/mediawiki/php-1.37.0-wmf.1/includes/parser/Parser.php(3415): Parser->fetchTemplateAndTitle(Title)
#24 /srv/mediawiki/php-1.37.0-wmf.1/includes/parser/Parser.php(3157): Parser->getTemplateDom(Title)
#25 /srv/mediawiki/php-1.37.0-wmf.1/includes/parser/PPFrame_Hash.php(263): Parser->braceSubstitution(array, PPFrame_Hash)
#26 /srv/mediawiki/php-1.37.0-wmf.1/includes/parser/Parser.php(2879): PPFrame_Hash->expand(PPNode_Hash_Tree, integer)
#27 /srv/mediawiki/php-1.37.0-wmf.1/includes/parser/Parser.php(1549): Parser->replaceVariables(string)
#28 /srv/mediawiki/php-1.37.0-wmf.1/includes/parser/Parser.php(639): Parser->internalParse(string)
#29 /srv/mediawiki/php-1.37.0-wmf.1/includes/content/WikitextContent.php(375): Parser->parse(string, Title, ParserOptions, boolean, boolean, integer)
#30 /srv/mediawiki/php-1.37.0-wmf.1/includes/content/AbstractContent.php(591): WikitextContent->fillParserOutput(Title, integer, ParserOptions, boolean, ParserOutput)
#31 /srv/mediawiki/php-1.37.0-wmf.1/includes/Revision/RenderedRevision.php(266): AbstractContent->getParserOutput(Title, integer, ParserOptions, boolean)
#32 /srv/mediawiki/php-1.37.0-wmf.1/includes/Revision/RenderedRevision.php(235): MediaWiki\Revision\RenderedRevision->getSlotParserOutputUncached(WikitextContent, boolean)
#33 /srv/mediawiki/php-1.37.0-wmf.1/includes/Revision/RevisionRenderer.php(217): MediaWiki\Revision\RenderedRevision->getSlotParserOutput(string, array)
#34 /srv/mediawiki/php-1.37.0-wmf.1/includes/Revision/RevisionRenderer.php(154): MediaWiki\Revision\RevisionRenderer->combineSlotOutput(MediaWiki\Revision\RenderedRevision, array)
#35 [internal function]: MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}(MediaWiki\Revision\RenderedRevision, array)
#36 /srv/mediawiki/php-1.37.0-wmf.1/includes/Revision/RenderedRevision.php(197): call_user_func(Closure, MediaWiki\Revision\RenderedRevision, array)
#37 /srv/mediawiki/php-1.37.0-wmf.1/includes/poolcounter/PoolWorkArticleView.php(137): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput()
#38 /srv/mediawiki/php-1.37.0-wmf.1/includes/poolcounter/PoolCounterWork.php(162): PoolWorkArticleView->doWork()
#39 /srv/mediawiki/php-1.37.0-wmf.1/includes/page/ParserOutputAccess.php(281): PoolCounterWork->execute()
#40 /srv/mediawiki/php-1.37.0-wmf.1/includes/page/Article.php(749): MediaWiki\Page\ParserOutputAccess->getParserOutput(WikiPage, ParserOptions, MediaWiki\Revision\RevisionStoreCacheRecord, integer)
#41 /srv/mediawiki/php-1.37.0-wmf.1/includes/page/Article.php(561): Article->generateContentOutput(User, ParserOptions, integer, OutputPage, array)
#42 /srv/mediawiki/php-1.37.0-wmf.1/includes/actions/ViewAction.php(74): Article->view()
#43 /srv/mediawiki/php-1.37.0-wmf.1/includes/MediaWiki.php(535): ViewAction->show()
#44 /srv/mediawiki/php-1.37.0-wmf.1/includes/MediaWiki.php(319): MediaWiki->performAction(Article, Title)
#45 /srv/mediawiki/php-1.37.0-wmf.1/includes/MediaWiki.php(916): MediaWiki->performRequest()
#46 /srv/mediawiki/php-1.37.0-wmf.1/includes/MediaWiki.php(550): MediaWiki->main()
#47 /srv/mediawiki/php-1.37.0-wmf.1/index.php(53): MediaWiki->run()
#48 /srv/mediawiki/php-1.37.0-wmf.1/index.php(46): wfIndexMain()
#49 /srv/mediawiki/w/index.php(3): require(string)
#50 {main}
Impact

Low. Not a train blocker at this level, but it looks quite worrtying. Occurs on a variety of wikis, including wikidata.

Notes

Not sure which part of MW is responsible, so my tags may be a little random. Please fix if so. Sorry.


https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-04-29_db_and_memc_load

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Urbanecm subscribed.

Tagging DBA, as they might be able to offer some guidance on finding the issue here.

jcrespo triaged this task as Unbreak Now! priority.Apr 29 2021, 3:45 PM
jcrespo subscribed.

This should be a blocker- es traffic has grown almost grown 100x since 14 april, correlates strongly with the 19h deploy:

es_issue.png (1×2 px, 285 KB)

Given we only make requests to external storage when parsercache has a miss, it seemed sensible to look for corresponding patterns in parsercache.

I see we introduced a new category of misses on the same date "miss_absent_metadata", see https://grafana-rw.wikimedia.org/d/000000106/parser-cache?viewPanel=7&orgId=1&from=now-30d&to=now which seems related.

Given we only make requests to external storage when parsercache has a miss, it seemed sensible to look for corresponding patterns in parsercache.

I see we introduced a new category of misses on the same date "miss_absent_metadata", see https://grafana-rw.wikimedia.org/d/000000106/parser-cache?viewPanel=7&orgId=1&from=now-30d&to=now which seems related.

that's probably a red herring, it's a bugfix Daniel Kinzler made in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/677346 and looks like a bugfix for the counter.

A better candidate for changing something is probably https://gerrit.wikimedia.org/r/c/mediawiki/core/+/677299. @Pchelolo is looking into it.

A better candidate for changing something is probably https://gerrit.wikimedia.org/r/c/mediawiki/core/+/677299. @Pchelolo is looking into it.

This one is a partial revert of a previously added optimization that was not needed, and is fixing PoolCounter - before PoolCounter couldn't fetch the parsed content after waiting for a lock. But we can try reverting it if there's no other guesses

There is definitely something going very wrong with memcached:

https://grafana.wikimedia.org/d/000000316/memcache?viewPanel=60&orgId=1&from=now-30d&to=now

shows misses increasing across the board

Krinkle renamed this task from Cannot access the database: Too many connections to SqlBlobStore no longer caching blobs (DBConnectionError Too many connections).Apr 29 2021, 5:38 PM

Change 683692 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@master] objectcache: set ATTR_DURABILITY in MemcachedBagOStuff

https://gerrit.wikimedia.org/r/683692

Change 683629 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@wmf/1.37.0-wmf.3] objectcache: set ATTR_DURABILITY in MemcachedBagOStuff

https://gerrit.wikimedia.org/r/683629

Change 683630 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@wmf/1.37.0-wmf.1] objectcache: set ATTR_DURABILITY in MemcachedBagOStuff

https://gerrit.wikimedia.org/r/683630

Change 683629 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.3] objectcache: set ATTR_DURABILITY in MemcachedBagOStuff

https://gerrit.wikimedia.org/r/683629

Change 683692 merged by jenkins-bot:

[mediawiki/core@master] objectcache: set ATTR_DURABILITY in MemcachedBagOStuff

https://gerrit.wikimedia.org/r/683692

Mentioned in SAL (#wikimedia-operations) [2021-04-29T18:10:57Z] <krinkle@deploy1002> Synchronized php-1.37.0-wmf.3/includes/libs/objectcache/MemcachedBagOStuff.php: I926797a9d494a31, T281480 (duration: 01m 09s)

Change 683630 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.1] objectcache: set ATTR_DURABILITY in MemcachedBagOStuff

https://gerrit.wikimedia.org/r/683630

Mentioned in SAL (#wikimedia-operations) [2021-04-29T18:38:23Z] <krinkle@deploy1002> Synchronized php-1.37.0-wmf.1/includes/libs/objectcache/MemcachedBagOStuff.php: I926797a9d494a31, T281480 (duration: 01m 08s)

Krinkle assigned this task to aaron.
Krinkle moved this task from General to libs/objectcache on the MediaWiki-libs-BagOStuff board.
Krinkle edited projects, added Performance-Team; removed Platform Engineering.