Page MenuHomePhabricator

Parse error from mediawiki-1.34.0/includes/parser/ParserOutput.php crashes CirrusSearch's forceSearchIndex.php
Open, LowPublic

Description

This occurs when called from CirrusSearch's forceSearchIndex.php. The result, after indexing some pages but before finishing the job, is:

MWException from line 348 of /var/www/mediawiki-1.34.0/includes/parser/ParserOutput.php: Bad parser output text.
#0 [internal function]: ParserOutput->{closure}(Array)
#1 /var/www/mediawiki-1.34.0/includes/parser/ParserOutput.php(359): preg_replace_callback('#<(?:mw:)?edits...', Object(Closure), '<div class="mw-...')
#2 /var/www/mediawiki-1.34.0/includes/content/WikiTextStructure.php(154): ParserOutput->getText(Array)
#3 /var/www/mediawiki-1.34.0/includes/content/WikiTextStructure.php(223): WikiTextStructure->extractWikitextParts()
#4 /var/www/mediawiki-1.34.0/includes/content/WikitextContentHandler.php(152): WikiTextStructure->getOpeningText()
#5 /var/www/mediawiki-1.34.0/extensions/CirrusSearch/includes/Updater.php(380): WikitextContentHandler->getDataForSearchIndex(Object(WikiPage), Object(ParserOutput), Object(CirrusSearch\CirrusSearch))
#6 /var/www/mediawiki-1.34.0/extensions/CirrusSearch/includes/Updater.php(458): CirrusSearch\Updater::buildDocument(Object(CirrusSearch\CirrusSearch), Object(WikiPage), Object(CirrusSearch\Connection), 0, 0, 0)
#7 /var/www/mediawiki-1.34.0/extensions/CirrusSearch/includes/Updater.php(236): CirrusSearch\Updater->buildDocumentsForPages(Array, 0)
#8 /var/www/mediawiki-1.34.0/extensions/CirrusSearch/maintenance/forceSearchIndex.php(219): CirrusSearch\Updater->updatePages(Array, 0)
#9 /var/www/mediawiki-1.34.0/maintenance/doMaintenance.php(99): CirrusSearch\ForceSearchIndex->execute()
#10 /var/www/mediawiki-1.34.0/extensions/CirrusSearch/maintenance/forceSearchIndex.php(689): require_once('/var/www/mediaw...')
#11 {main}

This is with CirrusSearch-REL1_34-a86e0a5.tar.gz

Subsequently added an "echo" to see what it was choking on, which looks to be this:

<h2><span class="mw-headline" id="Links">Links</span><mw:editsection page="File::Spec" section="1">Links</mw:editsection></h2>
<ul><li><a rel="nofollow" class="external free" href="http://perldoc.perl.org/File/Spec.html">http://perldoc.perl.org/File/Spec.html</a></li></ul>

On the older version of the wiki, searching for perldoc (in SphinxSearch in this case) brings up just this:

perldoc1.png (285×616 px, 40 KB)

Clicking on which shows this error:

perldoc2.png (83×1 px, 14 KB)

In other words this appears to be about something that can only be inherited from really old content, but which might be more gracefully skipped over when encountered. Adding the middle three lines here allowed indexing to run to completion:

if ( $options['enableSectionEditLinks'] ) {
        if (preg_match("|::|",$text)) {
                $text=preg_replace("|::|","\:\:",$text);
        }
        $text = preg_replace_callback(

Most likely there's a better way to fix this though.