Page MenuHomePhabricator

TypeError: Argument 1 passed to HtmlFormatter\HtmlFormatter::onHtmlReady() must be of the type string, null given, called in /srv/mediawiki/php-1.41.0-wmf.24/vendor/wikimedia/html-formatter/src/HtmlFormatter.php on line 314
Open, Needs TriagePublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   TypeError: Argument 1 passed to HtmlFormatter\HtmlFormatter::onHtmlReady() must be of the type string, null given, called in /srv/mediawiki/php-1.41.0-wmf.24/vendor/wikimedia/html-formatter/src/HtmlFormatter.php on line 314
exception.trace
from /srv/mediawiki/php-1.41.0-wmf.24/vendor/wikimedia/html-formatter/src/HtmlFormatter.php(90)
#0 /srv/mediawiki/php-1.41.0-wmf.24/vendor/wikimedia/html-formatter/src/HtmlFormatter.php(314): HtmlFormatter\HtmlFormatter->onHtmlReady(NULL)
#1 /srv/mediawiki/php-1.41.0-wmf.24/includes/content/WikiTextStructure.php(184): HtmlFormatter\HtmlFormatter->getText(DOMElement)
#2 /srv/mediawiki/php-1.41.0-wmf.24/includes/content/WikiTextStructure.php(226): WikiTextStructure->extractWikitextParts()
#3 /srv/mediawiki/php-1.41.0-wmf.24/includes/content/WikitextContentHandler.php(223): WikiTextStructure->getOpeningText()
#4 /srv/mediawiki/php-1.41.0-wmf.24/extensions/CirrusSearch/includes/BuildDocument/ParserOutputPageProperties.php(92): WikitextContentHandler->getDataForSearchIndex(WikiPage, ParserOutput, CirrusSearch\CirrusSearch, MediaWiki\Revision\RevisionStoreRecord)
#5 /srv/mediawiki/php-1.41.0-wmf.24/includes/libs/objectcache/wancache/WANObjectCache.php(1726): CirrusSearch\BuildDocument\ParserOutputPageProperties->CirrusSearch\BuildDocument\{closure}(boolean, integer, array, NULL, array)
#6 /srv/mediawiki/php-1.41.0-wmf.24/includes/libs/objectcache/wancache/WANObjectCache.php(1556): WANObjectCache->fetchOrRegenerate(string, integer, Closure, array, array)
#7 /srv/mediawiki/php-1.41.0-wmf.24/extensions/CirrusSearch/includes/BuildDocument/ParserOutputPageProperties.php(95): WANObjectCache->getWithSetCallback(string, integer, Closure)
#8 /srv/mediawiki/php-1.41.0-wmf.24/extensions/CirrusSearch/includes/BuildDocument/ParserOutputPageProperties.php(51): CirrusSearch\BuildDocument\ParserOutputPageProperties->finalizeReal(Elastica\Document, WikiPage, CirrusSearch\CirrusSearch, MediaWiki\Revision\RevisionStoreRecord)
#9 /srv/mediawiki/php-1.41.0-wmf.24/extensions/CirrusSearch/includes/BuildDocument/BuildDocument.php(196): CirrusSearch\BuildDocument\ParserOutputPageProperties->finalize(Elastica\Document, MediaWiki\Title\Title, MediaWiki\Revision\RevisionStoreRecord)
#10 /srv/mediawiki/php-1.41.0-wmf.24/extensions/CirrusSearch/includes/DataSender.php(273): CirrusSearch\BuildDocument\BuildDocument->finalize(Elastica\Document)
#11 /srv/mediawiki/php-1.41.0-wmf.24/extensions/CirrusSearch/includes/Job/ElasticaWrite.php(173): CirrusSearch\DataSender->sendData(string, array)
#12 /srv/mediawiki/php-1.41.0-wmf.24/extensions/CirrusSearch/includes/Job/JobTraits.php(137): CirrusSearch\Job\ElasticaWrite->doJob()
#13 /srv/mediawiki/php-1.41.0-wmf.24/extensions/CirrusSearch/includes/Updater.php(452): CirrusSearch\Job\CirrusGenericJob->run()
#14 /srv/mediawiki/php-1.41.0-wmf.24/extensions/CirrusSearch/includes/Updater.php(232): CirrusSearch\Updater->pushElasticaWriteJobs(array, Closure)
#15 /srv/mediawiki/php-1.41.0-wmf.24/extensions/CirrusSearch/includes/Updater.php(81): CirrusSearch\Updater->updatePages(array, integer, string, integer)
#16 /srv/mediawiki/php-1.41.0-wmf.24/extensions/CirrusSearch/includes/Job/LinksUpdate.php(152): CirrusSearch\Updater->updateFromTitle(MediaWiki\Title\Title, string, integer)
#17 /srv/mediawiki/php-1.41.0-wmf.24/extensions/CirrusSearch/includes/Job/LinksUpdate.php(132): CirrusSearch\Job\LinksUpdate->update(CirrusSearch\Updater)
#18 /srv/mediawiki/php-1.41.0-wmf.24/extensions/CirrusSearch/includes/Job/JobTraits.php(137): CirrusSearch\Job\LinksUpdate->doJob()
#19 /srv/mediawiki/php-1.41.0-wmf.24/extensions/EventBus/includes/JobExecutor.php(83): CirrusSearch\Job\CirrusTitleJob->run()
#20 /srv/mediawiki/rpc/RunSingleJob.php(77): MediaWiki\Extension\EventBus\JobExecutor->execute(array)
#21 {main}
Impact
Notes

Details

Request URL
https://jobrunner.discovery.wmnet/rpc/RunSingleJob.php

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Seeing quite a few of these in the past few days—currently at the rate of 20/hr: https://logstash.wikimedia.org/goto/0c7c9b8ce9ac75ca3a0ce41036b56ef4

Gehel added a subscriber: Gehel.

This seems to be more about Wikitext parser than it is about CirrusSearch. I'm tagging MediaWiki-Platform-Team, hopefully they can help.

Relevant code: https://gerrit.wikimedia.org/g/HtmlFormatter/+/9f3f30ad5d0287a1ca17c50bcbc45457db534208/src/HtmlFormatter.php#313

The argument to onHtmlReady() can be null only if preg_replace() returned null. It returns null "if an error occurred".

The error is new in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/892005, but that hasn't introduced any significant changes, just added the type hint.

That regexp will crash with PREG_BACKTRACK_LIMIT_ERROR when executed on a string that doesn't contain <body> and is at least 1000000 bytes long (1 MB).

Minimal test case:

$html = str_repeat( 'x', 1000*1000 );
var_dump( preg_replace( '/^.*?<body>|<\/body>.*$/s', '', $html ) );
var_dump( preg_last_error() === PREG_BACKTRACK_LIMIT_ERROR );

My advice would be to stop using HtmlFormatter and its dodgy regexps in WikiTextStructure (see also T255586 and T258964), and instead process the HTML using Wikimedia\Parsoid\Utils methods (like DOMUtils::parseHTML() / DOMCompat::querySelector()) and standard DOM methods (like removeChild()).

There's another bug related to PREG_BACKTRACK_LIMIT_ERROR -- T341320. The fix for this (an incorrectly too-low backtrack limit) was deployed relatively recently; worth a double check that this isn't the same cause.

My advice would be to stop using HtmlFormatter and its dodgy regexps in WikiTextStructure (see also T255586 and T258964), and instead process the HTML using Wikimedia\Parsoid\Utils methods (like DOMUtils::parseHTML() / DOMCompat::querySelector()) and standard DOM methods (like removeChild()).

See also T255586: Replace HTMLFormatter by Remex.

I observed this issue on 1.39.5. Given the simple regex, I replaced it by strpos+substr:

@@ -313 +313,10 @@ class HtmlFormatter {
-               $html = \preg_replace( '/^.*?<body>|<\/body>.*$/s', '', $html );
+               $pos = strpos( $html, '<body>' );
+               if ( $pos !== false ) {
+                       $html = substr( $html, $pos+6 );
+               }
+               $pos = strpos( $html, '</body>' );
+               if ( $pos !== false ) {
+                       $html = substr( $html, 0, $pos );
+               }

Change 967666 had a related patch set uploaded (by Seb35; author: Seb35):

[HtmlFormatter@master] Avoid backtracking limit issues from preg_replace

https://gerrit.wikimedia.org/r/967666

I propose my patch, it fixes this issue as well as a similar one for long comments (the second preg_replace). I added 2 units tests inspired from matmarex’s comment above (T345319#9140869), these unit tests did not pass without this fix (triggering type errors) and pass with this fix.

Fixing like this is a short-term fix of this issue, letting T255586 for the longer term.