Page MenuHomePhabricator

InvalidArgumentException: WikiPage constructed on a Title that cannot exist as a page: commons:File:The whole map of Sanmon-Santō Sakamoto Sōezu (1st volume).png
Open, HighPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   InvalidArgumentException: WikiPage constructed on a Title that cannot exist as a page: commons:File:The whole map of Sanmon-Santō Sakamoto Sōezu (1st volume).png
exception.trace
from /srv/mediawiki/php-1.38.0-wmf.20/includes/page/WikiPage.php(178)
#0 /srv/mediawiki/php-1.38.0-wmf.20/includes/page/WikiFilePage.php(46): WikiPage->__construct(Title)
#1 /srv/mediawiki/php-1.38.0-wmf.20/extensions/CommonsMetadata/src/DataCollector.php(286): WikiFilePage->__construct(Title)
#2 /srv/mediawiki/php-1.38.0-wmf.20/extensions/CommonsMetadata/src/DataCollector.php(98): CommonsMetadata\DataCollector->getCategories(ForeignDBFile, array)
#3 /srv/mediawiki/php-1.38.0-wmf.20/extensions/CommonsMetadata/src/HookHandler.php(75): CommonsMetadata\DataCollector->collect(array, ForeignDBFile)
#4 /srv/mediawiki/php-1.38.0-wmf.20/includes/HookContainer/HookContainer.php(338): CommonsMetadata\HookHandler::onGetExtendedMetadata(array, ForeignDBFile, DerivativeContext, boolean, integer)
#5 /srv/mediawiki/php-1.38.0-wmf.20/includes/HookContainer/HookContainer.php(137): MediaWiki\HookContainer\HookContainer->callLegacyHook(string, array, array, array)
#6 /srv/mediawiki/php-1.38.0-wmf.20/includes/HookContainer/HookRunner.php(1810): MediaWiki\HookContainer\HookContainer->run(string, array)
#7 /srv/mediawiki/php-1.38.0-wmf.20/includes/media/FormatMetadata.php(1821): MediaWiki\HookContainer\HookRunner->onGetExtendedMetadata(array, ForeignDBFile, DerivativeContext, boolean, integer)
#8 /srv/mediawiki/php-1.38.0-wmf.20/includes/media/FormatMetadata.php(1738): FormatMetadata->getExtendedMetadataFromHook(ForeignDBFile, array, integer)
#9 /srv/mediawiki/php-1.38.0-wmf.20/extensions/PageImages/includes/Hooks/ParserFileProcessingHookHandlers.php(372): FormatMetadata->fetchExtendedMetadata(ForeignDBFile)
#10 /srv/mediawiki/php-1.38.0-wmf.20/extensions/PageImages/includes/Hooks/ParserFileProcessingHookHandlers.php(353): PageImages\Hooks\ParserFileProcessingHookHandlers->fetchFileMetadata(ForeignDBFile)
#11 /srv/mediawiki/php-1.38.0-wmf.20/extensions/PageImages/includes/Hooks/ParserFileProcessingHookHandlers.php(205): PageImages\Hooks\ParserFileProcessingHookHandlers->isImageFree(string)
#12 /srv/mediawiki/php-1.38.0-wmf.20/extensions/PageImages/includes/Hooks/ParserFileProcessingHookHandlers.php(151): PageImages\Hooks\ParserFileProcessingHookHandlers->findBestImages(array)
#13 /srv/mediawiki/php-1.38.0-wmf.20/extensions/PageImages/includes/Hooks/ParserFileProcessingHookHandlers.php(66): PageImages\Hooks\ParserFileProcessingHookHandlers->doParserAfterTidy(Parser, string)
#14 /srv/mediawiki/php-1.38.0-wmf.20/includes/HookContainer/HookContainer.php(338): PageImages\Hooks\ParserFileProcessingHookHandlers::onParserAfterTidy(Parser, string)
#15 /srv/mediawiki/php-1.38.0-wmf.20/includes/HookContainer/HookContainer.php(137): MediaWiki\HookContainer\HookContainer->callLegacyHook(string, array, array, array)
#16 /srv/mediawiki/php-1.38.0-wmf.20/includes/HookContainer/HookRunner.php(2827): MediaWiki\HookContainer\HookContainer->run(string, array)
#17 /srv/mediawiki/php-1.38.0-wmf.20/includes/parser/Parser.php(1701): MediaWiki\HookContainer\HookRunner->onParserAfterTidy(Parser, string)
#18 /srv/mediawiki/php-1.38.0-wmf.20/includes/parser/Parser.php(693): Parser->internalParseHalfParsed(string, boolean, boolean)
#19 /srv/mediawiki/php-1.38.0-wmf.20/includes/content/WikitextContentHandler.php(294): Parser->parse(string, Title, ParserOptions, boolean, boolean, integer)
#20 /srv/mediawiki/php-1.38.0-wmf.20/includes/content/ContentHandler.php(1723): WikitextContentHandler->fillParserOutput(WikitextContent, MediaWiki\Content\Renderer\ContentParseParams, ParserOutput)
#21 /srv/mediawiki/php-1.38.0-wmf.20/includes/content/Renderer/ContentRenderer.php(47): ContentHandler->getParserOutput(WikitextContent, MediaWiki\Content\Renderer\ContentParseParams)
#22 /srv/mediawiki/php-1.38.0-wmf.20/includes/api/ApiQueryRevisionsBase.php(631): MediaWiki\Content\Renderer\ContentRenderer->getParserOutput(WikitextContent, Title, integer, ParserOptions)
#23 /srv/mediawiki/php-1.38.0-wmf.20/includes/api/ApiQueryRevisionsBase.php(459): ApiQueryRevisionsBase->extractDeprecatedContent(WikitextContent, MediaWiki\Revision\RevisionStoreRecord)
#24 /srv/mediawiki/php-1.38.0-wmf.20/includes/api/ApiQueryRevisionsBase.php(371): ApiQueryRevisionsBase->extractAllSlotInfo(MediaWiki\Revision\RevisionStoreRecord, integer)
#25 /srv/mediawiki/php-1.38.0-wmf.20/includes/api/ApiQueryRevisions.php(440): ApiQueryRevisionsBase->extractRevisionInfo(MediaWiki\Revision\RevisionStoreRecord, stdClass)
#26 /srv/mediawiki/php-1.38.0-wmf.20/includes/api/ApiQueryRevisionsBase.php(120): ApiQueryRevisions->run()
#27 /srv/mediawiki/php-1.38.0-wmf.20/includes/api/ApiQuery.php(629): ApiQueryRevisionsBase->execute()
#28 /srv/mediawiki/php-1.38.0-wmf.20/includes/api/ApiMain.php(1890): ApiQuery->execute()
#29 /srv/mediawiki/php-1.38.0-wmf.20/includes/api/ApiMain.php(868): ApiMain->executeAction()
#30 /srv/mediawiki/php-1.38.0-wmf.20/includes/api/ApiMain.php(839): ApiMain->executeActionWithErrorHandling()
#31 /srv/mediawiki/php-1.38.0-wmf.20/api.php(90): ApiMain->execute()
#32 /srv/mediawiki/php-1.38.0-wmf.20/api.php(45): wfApiMain()
#33 /srv/mediawiki/w/api.php(3): require(string)
#34 {main}
Impact
Notes

Same as T299665 ?

Details

Request URL
https://en.wikipedia.org/w/api.php?prop=*&rvprop=*&rvlimit=*&rvparse=*&titles=*&format=*&action=query

Event Timeline

Krinkle triaged this task as High priority.Feb 13 2022, 2:35 PM
Krinkle moved this task from Untriaged to Feb 2022 on the Wikimedia-production-error board.

Looks like DataCollector is trying to get the categories of an interwiki link, which doesn't make sense. It needs to check Title::canExists() first, or otherwise ensure that it's actually dealing with a reference to a page rather than some other kind of link.

After digging a bit deeper , I think the root cause is that it's not quite clear whether interwiki file references are a thing we support. It's easy to defend against this issue in CommonsMetadata, but I'm wondering if File::normalizeTitle shouldn't just return null/throw when given a string like "commons:File:foo"? In our environment, [[commons:File:Test.jpg]] is just a link, not a file inclusion. But can we hard-code that assumption?

How does an interwiki link trigger file handling, anyway? I thought namespace parsing doesn't happen for interwiki titles.

So apparently https://en.wikipedia.org/wiki/Sanmon-Sant%C5%8D_Sakamoto_S%C5%8Dezu (which is quite broken now because of this issue) has this snippet:

<gallery widths="500px" heights="353px" perrow="1" mode="nolines">
commons:The whole map of Sanmon-Santō Sakamoto Sōezu (1st volume).png|{{nihongo|||Sanmon-Santō Sakamoto Sōezu}} 1st volume (Yokawa area in [[Enryaku-ji|Enryakuji Temple]] on [[Mount_Hiei|Mount Hiei]] and Sakamoto area)
</gallery>

The Parser does $title = Title::newFromText( $matches[1], NS_FILE ); on gallery lines, so it ends up with a file-namespace interwiki title because NS_FILE is specified as default and interwiki titles just use the default namespace. The gallery logic has a namespace sanity check but the title is in the file namespace and passes it. The gallery code then uses this title to construct a File, passes it to Parser::modifyImageHtml(), the ParserModifyImageHTML hook in PageImages sends it to PageImages\Hooks\ParserFileProcessingHookHandlers::isImageFree() which invokes CommonsMetadata to check copyright status, and then CommonsMetadata breaks because File::getOriginalTitle() unexpectedly returns an interwiki title.

So the immediate and minimal fix would be to extend the namespace check at TraditionalImageGallery.php#90 with an interwikiness-check, but that doesn't feel very satisfying.

Change 762571 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] Treat interwiki titles in <gallery> as broken

https://gerrit.wikimedia.org/r/762571

Change 762820 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] WIP: Force interwiki links to use NS_MAIN.

https://gerrit.wikimedia.org/r/762820

@Tgr thank you for figuring out what was going on here! I merged your patch, and made another one that should make sure that this kind of thing can't happen again elsewhere.

Change 762571 merged by jenkins-bot:

[mediawiki/core@master] Treat interwiki titles in <gallery> as broken

https://gerrit.wikimedia.org/r/762571

As I understand it, this change avoids a fatal error. That seems worth adding a regression test for since it's not merely about rendering the UI differently but between rendering something (neutral plain text in wikitext) vs fatalling the request in an unhelful way, due to a recently added constraint that the code didn't need to fulfil previously.