Page MenuHomePhabricator

Parse error from mediawiki-1.34.0/includes/parser/ParserOutput.php crashes CirrusSearch's forceSearchIndex.php
Open, LowPublic

Description

This occurs when called from CirrusSearch's forceSearchIndex.php. The result, after indexing some pages but before finishing the job, is:

MWException from line 348 of /var/www/mediawiki-1.34.0/includes/parser/ParserOutput.php: Bad parser output text.
#0 [internal function]: ParserOutput->{closure}(Array)
#1 /var/www/mediawiki-1.34.0/includes/parser/ParserOutput.php(359): preg_replace_callback('#<(?:mw:)?edits...', Object(Closure), '<div class="mw-...')
#2 /var/www/mediawiki-1.34.0/includes/content/WikiTextStructure.php(154): ParserOutput->getText(Array)
#3 /var/www/mediawiki-1.34.0/includes/content/WikiTextStructure.php(223): WikiTextStructure->extractWikitextParts()
#4 /var/www/mediawiki-1.34.0/includes/content/WikitextContentHandler.php(152): WikiTextStructure->getOpeningText()
#5 /var/www/mediawiki-1.34.0/extensions/CirrusSearch/includes/Updater.php(380): WikitextContentHandler->getDataForSearchIndex(Object(WikiPage), Object(ParserOutput), Object(CirrusSearch\CirrusSearch))
#6 /var/www/mediawiki-1.34.0/extensions/CirrusSearch/includes/Updater.php(458): CirrusSearch\Updater::buildDocument(Object(CirrusSearch\CirrusSearch), Object(WikiPage), Object(CirrusSearch\Connection), 0, 0, 0)
#7 /var/www/mediawiki-1.34.0/extensions/CirrusSearch/includes/Updater.php(236): CirrusSearch\Updater->buildDocumentsForPages(Array, 0)
#8 /var/www/mediawiki-1.34.0/extensions/CirrusSearch/maintenance/forceSearchIndex.php(219): CirrusSearch\Updater->updatePages(Array, 0)
#9 /var/www/mediawiki-1.34.0/maintenance/doMaintenance.php(99): CirrusSearch\ForceSearchIndex->execute()
#10 /var/www/mediawiki-1.34.0/extensions/CirrusSearch/maintenance/forceSearchIndex.php(689): require_once('/var/www/mediaw...')
#11 {main}

This is with CirrusSearch-REL1_34-a86e0a5.tar.gz

Subsequently added an "echo" to see what it was choking on, which looks to be this:

<h2><span class="mw-headline" id="Links">Links</span><mw:editsection page="File::Spec" section="1">Links</mw:editsection></h2>
<ul><li><a rel="nofollow" class="external free" href="http://perldoc.perl.org/File/Spec.html">http://perldoc.perl.org/File/Spec.html</a></li></ul>

On the older version of the wiki, searching for perldoc (in SphinxSearch in this case) brings up just this:

Clicking on which shows this error:

In other words this appears to be about something that can only be inherited from really old content, but which might be more gracefully skipped over when encountered. Adding the middle three lines here allowed indexing to run to completion:

if ( $options['enableSectionEditLinks'] ) {
        if (preg_match("|::|",$text)) {
                $text=preg_replace("|::|","\:\:",$text);
        }
        $text = preg_replace_callback(

Most likely there's a better way to fix this though.

Event Timeline

WhitWye created this task.Feb 7 2020, 8:34 PM
Restricted Application added a project: Discovery-Search. · View Herald TranscriptFeb 7 2020, 8:34 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
WhitWye updated the task description. (Show Details)Feb 7 2020, 9:07 PM
WhitWye updated the task description. (Show Details)
WhitWye updated the task description. (Show Details)Feb 7 2020, 9:22 PM
Reedy updated the task description. (Show Details)Feb 7 2020, 10:08 PM
EBernhardson triaged this task as Low priority.Feb 10 2020, 6:45 PM
EBernhardson moved this task from needs triage to elastic / cirrus on the Discovery-Search board.