Page MenuHomePhabricator

Some old revisions on Chinese Wikipedia are truncated, resulting in invalid UTF-8 and "RuntimeException: PCRE failure" when viewing them
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   RuntimeException: PCRE failure
FrameLocationCall
from/srv/mediawiki/php-1.44.0-wmf.18/includes/parser/Parser.php(2172)
#0/srv/mediawiki/php-1.44.0-wmf.18/includes/parser/Parser.php(1632)MediaWiki\Parser\Parser->handleExternalLinks(string)
#1/srv/mediawiki/php-1.44.0-wmf.18/includes/parser/Parser.php(701)MediaWiki\Parser\Parser->internalParse(string)
#2/srv/mediawiki/php-1.44.0-wmf.18/includes/content/WikitextContentHandler.php(384)MediaWiki\Parser\Parser->parse(string, MediaWiki\Title\Title, MediaWiki\Parser\ParserOptions, bool, bool, int)
#3/srv/mediawiki/php-1.44.0-wmf.18/includes/content/ContentHandler.php(1697)MediaWiki\Content\WikitextContentHandler->fillParserOutput(MediaWiki\Content\WikitextContent, MediaWiki\Content\Renderer\ContentParseParams, MediaWiki\Parser\ParserOutput)
#4/srv/mediawiki/php-1.44.0-wmf.18/includes/content/Renderer/ContentRenderer.php(75)MediaWiki\Content\ContentHandler->getParserOutput(MediaWiki\Content\WikitextContent, MediaWiki\Content\Renderer\ContentParseParams)
#5/srv/mediawiki/php-1.44.0-wmf.18/includes/Revision/RenderedRevision.php(261)MediaWiki\Content\Renderer\ContentRenderer->getParserOutput(MediaWiki\Content\WikitextContent, MediaWiki\Page\PageIdentityValue, MediaWiki\Revision\RevisionStoreRecord, MediaWiki\Parser\ParserOptions, array)
#6/srv/mediawiki/php-1.44.0-wmf.18/includes/Revision/RenderedRevision.php(233)MediaWiki\Revision\RenderedRevision->getSlotParserOutputUncached(MediaWiki\Content\WikitextContent, array)
#7/srv/mediawiki/php-1.44.0-wmf.18/includes/Revision/RevisionRenderer.php(236)MediaWiki\Revision\RenderedRevision->getSlotParserOutput(string, array)
#8/srv/mediawiki/php-1.44.0-wmf.18/includes/Revision/RevisionRenderer.php(169)MediaWiki\Revision\RevisionRenderer->combineSlotOutput(MediaWiki\Revision\RenderedRevision, MediaWiki\Parser\ParserOptions, array)
#9/srv/mediawiki/php-1.44.0-wmf.18/includes/Revision/RenderedRevision.php(196)MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}(MediaWiki\Revision\RenderedRevision, array)
#10/srv/mediawiki/php-1.44.0-wmf.18/includes/page/ParserOutputAccess.php(462)MediaWiki\Revision\RenderedRevision->getRevisionParserOutput()
#11/srv/mediawiki/php-1.44.0-wmf.18/includes/page/ParserOutputAccess.php(373)MediaWiki\Page\ParserOutputAccess->renderRevision(WikiPage, MediaWiki\Parser\ParserOptions, MediaWiki\Revision\RevisionStoreRecord, int, null)
#12/srv/mediawiki/php-1.44.0-wmf.18/includes/diff/DifferenceEngine.php(1255)MediaWiki\Page\ParserOutputAccess->getParserOutput(WikiPage, MediaWiki\Parser\ParserOptions, MediaWiki\Revision\RevisionStoreRecord, int)
#13/srv/mediawiki/php-1.44.0-wmf.18/includes/diff/DifferenceEngine.php(1040)DifferenceEngine->renderNewRevision()
#14/srv/mediawiki/php-1.44.0-wmf.18/includes/page/Article.php(1068)DifferenceEngine->showDiffPage(bool)
#15/srv/mediawiki/php-1.44.0-wmf.18/includes/page/Article.php(483)Article->showDiffPage()
#16/srv/mediawiki/php-1.44.0-wmf.18/includes/actions/ViewAction.php(78)Article->view()
#17/srv/mediawiki/php-1.44.0-wmf.18/includes/actions/ActionEntryPoint.php(732)ViewAction->show()
#18/srv/mediawiki/php-1.44.0-wmf.18/includes/actions/ActionEntryPoint.php(509)MediaWiki\Actions\ActionEntryPoint->performAction(Article, MediaWiki\Title\Title)
#19/srv/mediawiki/php-1.44.0-wmf.18/includes/actions/ActionEntryPoint.php(145)MediaWiki\Actions\ActionEntryPoint->performRequest()
#20/srv/mediawiki/php-1.44.0-wmf.18/includes/MediaWikiEntryPoint.php(202)MediaWiki\Actions\ActionEntryPoint->execute()
#21/srv/mediawiki/php-1.44.0-wmf.18/index.php(58)MediaWiki\MediaWikiEntryPoint->run()
#22/srv/mediawiki/w/index.php(3)require(string)
#23{main}
Impact
Notes

Details

Request URL
https://zh.wikipedia.org/w/index.php?diff=*&oldid=*&title=*

Event Timeline

This error message usually means corrupt revision text is stored in the DB somewhere. Prior art at T351953, T387188.

It would be nice to know what revision that is so we can figure out why and how it became corrupt.

Pppery renamed this task from RuntimeException: PCRE failure to RuntimeException: PCRE failure viewing old revision on Chinese Wikipedia.Mar 1 2025, 12:16 AM

/w/index.php?diff=prev&oldid=103405&title=%E7%A7%98%E9%B2%81

That's weird - looking at https://zh.wikipedia.org/w/index.php?title=%E7%A7%98%E9%B2%81&action=history&dir=prev the content of that revision, along with several others nearby (the ones with byte counts around 760), is truncated not at a character boundary, thereby causing invalid UTF8. (you can use action=raw to see the content)

This doesn't match the pattern of any of the other bugs causing corrupted revision text that I recall. The quick fix would be to use findBadBlobs.php to mark them as bad ...

Pppery renamed this task from RuntimeException: PCRE failure viewing old revision on Chinese Wikipedia to Some old revisions on Chinese Wikipedia are truncated, resulting in invalid UTF-8 and "RuntimeException: PCRE failure" when viewing them.Mar 2 2025, 9:25 PM

These were marked known-bad