The code jumps through lots of hoops to avoid double parsing (e.g. by locking the page, etc.) but it actually parses the page twice inside the job:
https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-2022.02.09?id=sWJW3H4BoAyk87sq2wWx
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T255502 Goal: Save Timing median back under 1 second | |||
Resolved | Krinkle | T277788 Save Timing improvements (2021-2022) | |||
Resolved | Ladsgroup | T292300 Eliminate unnecessary duplicate parses (2021-2022) | |||
Resolved | matmarex | T301309 Refreshlinks job is parsing pages twice |
Event Timeline
Change 761487 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/core@master] DerivedPageDataUpdater: Set ParserOutput when it's passed to it
Change 761487 merged by jenkins-bot:
[mediawiki/core@master] DerivedPageDataUpdater: Set ParserOutput when it's passed to it
Change 761413 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/core@wmf/1.38.0-wmf.20] DerivedPageDataUpdater: Set ParserOutput when it's passed to it
Change 761414 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/core@wmf/1.38.0-wmf.21] DerivedPageDataUpdater: Set ParserOutput when it's passed to it
Change 761413 merged by jenkins-bot:
[mediawiki/core@wmf/1.38.0-wmf.20] DerivedPageDataUpdater: Set ParserOutput when it's passed to it
Change 761414 merged by jenkins-bot:
[mediawiki/core@wmf/1.38.0-wmf.21] DerivedPageDataUpdater: Set ParserOutput when it's passed to it
Mentioned in SAL (#wikimedia-operations) [2022-02-10T15:32:38Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.21/includes/Storage/DerivedPageDataUpdater.php: Backport: [[gerrit:761414|DerivedPageDataUpdater: Set ParserOutput when it's passed to it (T301309)]] (duration: 00m 53s)
Mentioned in SAL (#wikimedia-operations) [2022-02-10T15:39:24Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.20/includes/Storage/DerivedPageDataUpdater.php: Backport: [[gerrit:761413|DerivedPageDataUpdater: Set ParserOutput when it's passed to it (T301309)]] (duration: 00m 50s)
This either hasn't worked or has regressed, because we're seeing duplicate parses inside RefreshLinksJob today. Example: https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2023.08.10?id=F4s54YkBRv1FAtdn4GRO
#0 /srv/mediawiki/php-1.41.0-wmf.20/includes/content/ContentHandler.php(1790): MediaWiki\Parser\ParserObserver->notifyParse(Object(MediaWiki\Title\Title), 740306188, Object(ParserOptions), Object(Wikibase\MediaInfo\Content\MediaInfoContent), Object(ParserOutput)) #1 /srv/mediawiki/php-1.41.0-wmf.20/includes/content/Renderer/ContentRenderer.php(47): ContentHandler->getParserOutput(Object(Wikibase\MediaInfo\Content\MediaInfoContent), Object(MediaWiki\Content\Renderer\ContentParseParams)) #2 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RenderedRevision.php(260): MediaWiki\Content\Renderer\ContentRenderer->getParserOutput(Object(Wikibase\MediaInfo\Content\MediaInfoContent), Object(MediaWiki\Title\Title), 740306188, Object(ParserOptions), false) #3 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RenderedRevision.php(232): MediaWiki\Revision\RenderedRevision->getSlotParserOutputUncached(Object(Wikibase\MediaInfo\Content\MediaInfoContent), false) #4 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RevisionRenderer.php(242): MediaWiki\Revision\RenderedRevision->getSlotParserOutput('mediainfo', Array) #5 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RevisionRenderer.php(164): MediaWiki\Revision\RevisionRenderer->combineSlotOutput(Object(MediaWiki\Revision\RenderedRevision), Array) #6 [internal function]: MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}(Object(MediaWiki\Revision\RenderedRevision), Array) #7 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RenderedRevision.php(199): call_user_func(Object(Closure), Object(MediaWiki\Revision\RenderedRevision), Array) #8 /srv/mediawiki/php-1.41.0-wmf.20/includes/jobqueue/jobs/RefreshLinksJob.php(330): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput(Array) #9 /srv/mediawiki/php-1.41.0-wmf.20/includes/jobqueue/jobs/RefreshLinksJob.php(230): RefreshLinksJob->getParserOutput(Object(MediaWiki\Revision\RevisionRenderer), Object(ParserCache), Object(WikiFilePage), Object(BufferingStatsdDataFactory)) #10 /srv/mediawiki/php-1.41.0-wmf.20/includes/jobqueue/jobs/RefreshLinksJob.php(162): RefreshLinksJob->runForTitle(Object(MediaWiki\Title\Title)) #11 /srv/mediawiki/php-1.41.0-wmf.20/extensions/EventBus/includes/JobExecutor.php(78): RefreshLinksJob->run() #12 /srv/mediawiki/rpc/RunSingleJob.php(77): MediaWiki\Extension\EventBus\JobExecutor->execute(Array) #13 {main}
#0 /srv/mediawiki/php-1.41.0-wmf.20/includes/content/ContentHandler.php(1790): MediaWiki\Parser\ParserObserver->notifyParse(Object(MediaWiki\Title\Title), 740306188, Object(ParserOptions), Object(Wikibase\MediaInfo\Content\MediaInfoContent), Object(ParserOutput)) #1 /srv/mediawiki/php-1.41.0-wmf.20/includes/content/Renderer/ContentRenderer.php(47): ContentHandler->getParserOutput(Object(Wikibase\MediaInfo\Content\MediaInfoContent), Object(MediaWiki\Content\Renderer\ContentParseParams)) #2 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RenderedRevision.php(260): MediaWiki\Content\Renderer\ContentRenderer->getParserOutput(Object(Wikibase\MediaInfo\Content\MediaInfoContent), Object(MediaWiki\Title\Title), 740306188, Object(ParserOptions), true) #3 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RenderedRevision.php(232): MediaWiki\Revision\RenderedRevision->getSlotParserOutputUncached(Object(Wikibase\MediaInfo\Content\MediaInfoContent), true) #4 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RevisionRenderer.php(242): MediaWiki\Revision\RenderedRevision->getSlotParserOutput('mediainfo', Array) #5 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RevisionRenderer.php(164): MediaWiki\Revision\RevisionRenderer->combineSlotOutput(Object(MediaWiki\Revision\RenderedRevision), Array) #6 [internal function]: MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}(Object(MediaWiki\Revision\RenderedRevision), Array) #7 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RenderedRevision.php(199): call_user_func(Object(Closure), Object(MediaWiki\Revision\RenderedRevision), Array) #8 /srv/mediawiki/php-1.41.0-wmf.20/includes/Storage/DerivedPageDataUpdater.php(1472): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput() #9 /srv/mediawiki/php-1.41.0-wmf.20/includes/Storage/DerivedPageDataUpdater.php(1504): MediaWiki\Storage\DerivedPageDataUpdater->getCanonicalParserOutput() #10 /srv/mediawiki/php-1.41.0-wmf.20/includes/deferred/RefreshSecondaryDataUpdate.php(85): MediaWiki\Storage\DerivedPageDataUpdater->getSecondaryDataUpdates(false) #11 /srv/mediawiki/php-1.41.0-wmf.20/includes/deferred/DeferredUpdatesManager.php(506): RefreshSecondaryDataUpdate->doUpdate() #12 /srv/mediawiki/php-1.41.0-wmf.20/includes/deferred/DeferredUpdates.php(277): MediaWiki\Deferred\DeferredUpdatesManager->attemptUpdate(Object(RefreshSecondaryDataUpdate), Object(Wikimedia\Rdbms\LBFactoryMulti)) #13 /srv/mediawiki/php-1.41.0-wmf.20/includes/Storage/DerivedPageDataUpdater.php(1836): DeferredUpdates::attemptUpdate(Object(RefreshSecondaryDataUpdate), Object(Wikimedia\Rdbms\LBFactoryMulti)) #14 /srv/mediawiki/php-1.41.0-wmf.20/includes/page/WikiPage.php(2100): MediaWiki\Storage\DerivedPageDataUpdater->doSecondaryDataUpdates(Array) #15 /srv/mediawiki/php-1.41.0-wmf.20/includes/jobqueue/jobs/RefreshLinksJob.php(244): WikiPage->doSecondaryDataUpdates(Array) #16 /srv/mediawiki/php-1.41.0-wmf.20/includes/jobqueue/jobs/RefreshLinksJob.php(162): RefreshLinksJob->runForTitle(Object(MediaWiki\Title\Title)) #17 /srv/mediawiki/php-1.41.0-wmf.20/extensions/EventBus/includes/JobExecutor.php(78): RefreshLinksJob->run() #18 /srv/mediawiki/rpc/RunSingleJob.php(77): MediaWiki\Extension\EventBus\JobExecutor->execute(Array) #19 {main}
It looks like the first parse is done with 'generate-html' => false, but the second is with true, so we have to parse again.
Change 947959 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):
[mediawiki/core@master] RefreshLinksJob: Generate HTML when parsing if DerivedPageDataUpdater will need it later
Change 947959 merged by jenkins-bot:
[mediawiki/core@master] RefreshLinksJob: Generate HTML when parsing if DerivedPageDataUpdater will need it later
Seems to have worked: https://logstash.wikimedia.org/goto/e68e6d825f4870401ebffdde0311d781