Page MenuHomePhabricator

Refreshlinks job is parsing pages twice
Closed, ResolvedPublic

Description

The code jumps through lots of hoops to avoid double parsing (e.g. by locking the page, etc.) but it actually parses the page twice inside the job:
https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-2022.02.09?id=sWJW3H4BoAyk87sq2wWx

Event Timeline

Change 761487 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] DerivedPageDataUpdater: Set ParserOutput when it's passed to it

https://gerrit.wikimedia.org/r/761487

Change 761487 merged by jenkins-bot:

[mediawiki/core@master] DerivedPageDataUpdater: Set ParserOutput when it's passed to it

https://gerrit.wikimedia.org/r/761487

Change 761413 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.38.0-wmf.20] DerivedPageDataUpdater: Set ParserOutput when it's passed to it

https://gerrit.wikimedia.org/r/761413

Change 761414 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.38.0-wmf.21] DerivedPageDataUpdater: Set ParserOutput when it's passed to it

https://gerrit.wikimedia.org/r/761414

Change 761413 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.20] DerivedPageDataUpdater: Set ParserOutput when it's passed to it

https://gerrit.wikimedia.org/r/761413

Change 761414 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.21] DerivedPageDataUpdater: Set ParserOutput when it's passed to it

https://gerrit.wikimedia.org/r/761414

Mentioned in SAL (#wikimedia-operations) [2022-02-10T15:32:38Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.21/includes/Storage/DerivedPageDataUpdater.php: Backport: [[gerrit:761414|DerivedPageDataUpdater: Set ParserOutput when it's passed to it (T301309)]] (duration: 00m 53s)

Mentioned in SAL (#wikimedia-operations) [2022-02-10T15:39:24Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.20/includes/Storage/DerivedPageDataUpdater.php: Backport: [[gerrit:761413|DerivedPageDataUpdater: Set ParserOutput when it's passed to it (T301309)]] (duration: 00m 50s)

Ladsgroup moved this task from Triage to Done on the DBA board.

This either hasn't worked or has regressed, because we're seeing duplicate parses inside RefreshLinksJob today. Example: https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2023.08.10?id=F4s54YkBRv1FAtdn4GRO

#0 /srv/mediawiki/php-1.41.0-wmf.20/includes/content/ContentHandler.php(1790): MediaWiki\Parser\ParserObserver->notifyParse(Object(MediaWiki\Title\Title), 740306188, Object(ParserOptions), Object(Wikibase\MediaInfo\Content\MediaInfoContent), Object(ParserOutput))
#1 /srv/mediawiki/php-1.41.0-wmf.20/includes/content/Renderer/ContentRenderer.php(47): ContentHandler->getParserOutput(Object(Wikibase\MediaInfo\Content\MediaInfoContent), Object(MediaWiki\Content\Renderer\ContentParseParams))
#2 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RenderedRevision.php(260): MediaWiki\Content\Renderer\ContentRenderer->getParserOutput(Object(Wikibase\MediaInfo\Content\MediaInfoContent), Object(MediaWiki\Title\Title), 740306188, Object(ParserOptions), false)
#3 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RenderedRevision.php(232): MediaWiki\Revision\RenderedRevision->getSlotParserOutputUncached(Object(Wikibase\MediaInfo\Content\MediaInfoContent), false)
#4 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RevisionRenderer.php(242): MediaWiki\Revision\RenderedRevision->getSlotParserOutput('mediainfo', Array)
#5 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RevisionRenderer.php(164): MediaWiki\Revision\RevisionRenderer->combineSlotOutput(Object(MediaWiki\Revision\RenderedRevision), Array)
#6 [internal function]: MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}(Object(MediaWiki\Revision\RenderedRevision), Array)
#7 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RenderedRevision.php(199): call_user_func(Object(Closure), Object(MediaWiki\Revision\RenderedRevision), Array)
#8 /srv/mediawiki/php-1.41.0-wmf.20/includes/jobqueue/jobs/RefreshLinksJob.php(330): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput(Array)
#9 /srv/mediawiki/php-1.41.0-wmf.20/includes/jobqueue/jobs/RefreshLinksJob.php(230): RefreshLinksJob->getParserOutput(Object(MediaWiki\Revision\RevisionRenderer), Object(ParserCache), Object(WikiFilePage), Object(BufferingStatsdDataFactory))
#10 /srv/mediawiki/php-1.41.0-wmf.20/includes/jobqueue/jobs/RefreshLinksJob.php(162): RefreshLinksJob->runForTitle(Object(MediaWiki\Title\Title))
#11 /srv/mediawiki/php-1.41.0-wmf.20/extensions/EventBus/includes/JobExecutor.php(78): RefreshLinksJob->run()
#12 /srv/mediawiki/rpc/RunSingleJob.php(77): MediaWiki\Extension\EventBus\JobExecutor->execute(Array)
#13 {main}
#0 /srv/mediawiki/php-1.41.0-wmf.20/includes/content/ContentHandler.php(1790): MediaWiki\Parser\ParserObserver->notifyParse(Object(MediaWiki\Title\Title), 740306188, Object(ParserOptions), Object(Wikibase\MediaInfo\Content\MediaInfoContent), Object(ParserOutput))
#1 /srv/mediawiki/php-1.41.0-wmf.20/includes/content/Renderer/ContentRenderer.php(47): ContentHandler->getParserOutput(Object(Wikibase\MediaInfo\Content\MediaInfoContent), Object(MediaWiki\Content\Renderer\ContentParseParams))
#2 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RenderedRevision.php(260): MediaWiki\Content\Renderer\ContentRenderer->getParserOutput(Object(Wikibase\MediaInfo\Content\MediaInfoContent), Object(MediaWiki\Title\Title), 740306188, Object(ParserOptions), true)
#3 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RenderedRevision.php(232): MediaWiki\Revision\RenderedRevision->getSlotParserOutputUncached(Object(Wikibase\MediaInfo\Content\MediaInfoContent), true)
#4 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RevisionRenderer.php(242): MediaWiki\Revision\RenderedRevision->getSlotParserOutput('mediainfo', Array)
#5 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RevisionRenderer.php(164): MediaWiki\Revision\RevisionRenderer->combineSlotOutput(Object(MediaWiki\Revision\RenderedRevision), Array)
#6 [internal function]: MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}(Object(MediaWiki\Revision\RenderedRevision), Array)
#7 /srv/mediawiki/php-1.41.0-wmf.20/includes/Revision/RenderedRevision.php(199): call_user_func(Object(Closure), Object(MediaWiki\Revision\RenderedRevision), Array)
#8 /srv/mediawiki/php-1.41.0-wmf.20/includes/Storage/DerivedPageDataUpdater.php(1472): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput()
#9 /srv/mediawiki/php-1.41.0-wmf.20/includes/Storage/DerivedPageDataUpdater.php(1504): MediaWiki\Storage\DerivedPageDataUpdater->getCanonicalParserOutput()
#10 /srv/mediawiki/php-1.41.0-wmf.20/includes/deferred/RefreshSecondaryDataUpdate.php(85): MediaWiki\Storage\DerivedPageDataUpdater->getSecondaryDataUpdates(false)
#11 /srv/mediawiki/php-1.41.0-wmf.20/includes/deferred/DeferredUpdatesManager.php(506): RefreshSecondaryDataUpdate->doUpdate()
#12 /srv/mediawiki/php-1.41.0-wmf.20/includes/deferred/DeferredUpdates.php(277): MediaWiki\Deferred\DeferredUpdatesManager->attemptUpdate(Object(RefreshSecondaryDataUpdate), Object(Wikimedia\Rdbms\LBFactoryMulti))
#13 /srv/mediawiki/php-1.41.0-wmf.20/includes/Storage/DerivedPageDataUpdater.php(1836): DeferredUpdates::attemptUpdate(Object(RefreshSecondaryDataUpdate), Object(Wikimedia\Rdbms\LBFactoryMulti))
#14 /srv/mediawiki/php-1.41.0-wmf.20/includes/page/WikiPage.php(2100): MediaWiki\Storage\DerivedPageDataUpdater->doSecondaryDataUpdates(Array)
#15 /srv/mediawiki/php-1.41.0-wmf.20/includes/jobqueue/jobs/RefreshLinksJob.php(244): WikiPage->doSecondaryDataUpdates(Array)
#16 /srv/mediawiki/php-1.41.0-wmf.20/includes/jobqueue/jobs/RefreshLinksJob.php(162): RefreshLinksJob->runForTitle(Object(MediaWiki\Title\Title))
#17 /srv/mediawiki/php-1.41.0-wmf.20/extensions/EventBus/includes/JobExecutor.php(78): RefreshLinksJob->run()
#18 /srv/mediawiki/rpc/RunSingleJob.php(77): MediaWiki\Extension\EventBus\JobExecutor->execute(Array)
#19 {main}

It looks like the first parse is done with 'generate-html' => false, but the second is with true, so we have to parse again.

matmarex claimed this task.

Change 947959 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] RefreshLinksJob: Generate HTML when parsing if DerivedPageDataUpdater will need it later

https://gerrit.wikimedia.org/r/947959

Change 947959 merged by jenkins-bot:

[mediawiki/core@master] RefreshLinksJob: Generate HTML when parsing if DerivedPageDataUpdater will need it later

https://gerrit.wikimedia.org/r/947959