Page MenuHomePhabricator

maintenance/rebuildtextindex.php may fail with WikiPage.php: Invalid or virtual namespace -1 given
Closed, ResolvedPublicBUG REPORT

Description

I've only seen this issue once (on Ansaikuropedia, during a MW 1.31.1 -> MW 1.35 upgrade), but a wiki with invalid data in the table can cause maintenance/rebuildtextindex.php to fail with an error by pointing to an invalid namespace.

On the affected wiki, php maintenance/rebuildtextindex.php abnormally ended with:

MWException from line 160 of .../includes/page/WikiPage.php: Invalid or virtual namespace -1 given.
#0 .../includes/page/WikiPage.php(223): WikiPage::factory()
#1 .../includes/page/WikiPage.php(208): WikiPage::newFromRow()
#2 .../includes/deferred/SearchUpdate.php(189): WikiPage::newFromID()
#3 .../includes/deferred/SearchUpdate.php(94): SearchUpdate->getLatestPage()
#4 .../maintenance/rebuildtextindex.php(114): SearchUpdate->doUpdate()
#5 .../maintenance/rebuildtextindex.php(68): RebuildTextIndex->populateSearchIndex()
#6 .../maintenance/doMaintenance.php(107): RebuildTextIndex->execute()
#7 .../maintenance/rebuildtextindex.php(162): require_once('/var/www/wiki13...')
#8 {main}

That left the site in an incompletely-updated state where every search would fail with errors, ie: a search for https://ansaikuropedia.org/index.php?search=Jimmy+Hoffa+Sr.&title=%E7%89%B9%E5%88%A5%3A%E6%A4%9C%E7%B4%A2&uselang=en&profile=default&fulltext=1 failed with:

[0319025be0e9dc8c66f11450] /index.php?search=Jimmy+Hoffa+Sr.&title=%E7%89%B9%E5%88%A5%3A%E6%A4%9C%E7%B4%A2&profile=default&fulltext=1 Wikimedia\Rdbms\DBQueryError from line 1699 of .../includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading?

Error 1191: Can't find FULLTEXT index matching the column list (localhost)
Function: SearchMySQL::searchInternal
Query: SELECT page_id,page_namespace,page_title FROM `page`,`searchindex` WHERE (page_id=si_page) AND ( MATCH(si_title) AGAINST('+jimmy +hoffa +sru800. ' IN BOOLEAN MODE) ) AND page_namespace = 0 ORDER BY MATCH(si_title) AGAINST('+jimmy +hoffa +sru800. ' IN NATURAL LANGUAGE MODE) DESC LIMIT 21

Backtrace:

#0 .../includes/libs/rdbms/database/Database.php(1683): Wikimedia\Rdbms\Database->getQueryException()
#1 .../includes/libs/rdbms/database/Database.php(1658): Wikimedia\Rdbms\Database->getQueryExceptionAndLog()
#2 .../includes/libs/rdbms/database/Database.php(1227): Wikimedia\Rdbms\Database->reportQueryError()
#3 .../includes/libs/rdbms/database/Database.php(1907): Wikimedia\Rdbms\Database->query()
#4 .../includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->select()
#5 .../includes/libs/rdbms/database/DBConnRef.php(313): Wikimedia\Rdbms\DBConnRef->__call()
#6 .../includes/search/SearchMySQL.php(193): Wikimedia\Rdbms\DBConnRef->select()
#7 .../includes/search/SearchMySQL.php(179): SearchMySQL->searchInternal()
#8 .../includes/search/SearchDatabase.php(74): SearchMySQL->doSearchTitleInDB()
#9 .../includes/search/SearchEngine.php(156): SearchDatabase->doSearchTitle()
#10 .../includes/search/SearchEngine.php(187): SearchEngine->{closure}()
#11 .../includes/search/SearchEngine.php(157): SearchEngine->maybePaginate()
#12 .../includes/specials/SpecialSearch.php(387): SearchEngine->searchTitle()
#13 .../includes/specials/SpecialSearch.php(179): SpecialSearch->showResults()
#14 .../includes/specialpage/SpecialPage.php(600): SpecialSearch->execute()
#15 .../includes/specialpage/SpecialPageFactory.php(635): SpecialPage->run()
#16 .../includes/MediaWiki.php(307): MediaWiki\SpecialPage\SpecialPageFactory->executePath()
#17 .../includes/MediaWiki.php(940): MediaWiki->performRequest()
#18 .../includes/MediaWiki.php(543): MediaWiki->main()
#19 .../index.php(53): MediaWiki->run()
#20 .../index.php(46): wfIndexMain()
#21 {main}

I tried editing includes/deferred/SearchUpdate.php to check for this condition, replacing

public function doUpdate() {
        $services = MediaWikiServices::getInstance();
        $config = $services->getSearchEngineConfig();

        if ( $config->getConfig()->get( 'DisableSearchUpdate' ) || !$this->id ) {
                return;
        }

        $seFactory = $services->getSearchEngineFactory();
        foreach ( $config->getSearchTypes() as $type )
        {

with

public function doUpdate() {
        $services = MediaWikiServices::getInstance();
        $config = $services->getSearchEngineConfig();

        if ( $config->getConfig()->get( 'DisableSearchUpdate' ) || !$this->id ) {
                return;
        }

        $seFactory = $services->getSearchEngineFactory();
        foreach ( $config->getSearchTypes() as $type )
                if ($this->title->mNamespace >= 0)      // 2020-11-24 : kludge to prevent -1 as namespace (a fatal exception):
        {

and run the maintenance/rebuildtextindex.php script again. With this one-line addition (of a sanity check), the script completes without incident. Try the search again and it works:

https://ansaikuropedia.org/index.php?search=Jimmy+Hoffa+Sr.&title=%E7%89%B9%E5%88%A5%3A%E6%A4%9C%E7%B4%A2&uselang=en&profile=default&fulltext=1

Search results
Jump to navigation
Jump to search

    Content pages
    Multimedia
    Everything
    Advanced

Create the page "Jimmy Hoffa Sr." on this wiki!

There were no results matching the query.

So we're still no closer to finding mista Hoffa, but at least this bog is no longer messing with my machine, capiche?

That still doesn't explain how the bad data got into the table in the first place, but it seems like a trivial enough one line change to check for this condition and skip anything with invalid (<0) namespaces?

Event Timeline

I'm a little confused by your patch, as while it is valid PHP, it doesn't make a lot of sense as is (or it was truncated)... Though, what your intention was makes some sense

Original code:

	/**
	 * Perform actual update for the entry
	 */
	public function doUpdate() {
		$services = MediaWikiServices::getInstance();
		$config = $services->getSearchEngineConfig();

		if ( $config->getConfig()->get( 'DisableSearchUpdate' ) || !$this->id ) {
			return;
		}

		$seFactory = $services->getSearchEngineFactory();
		foreach ( $config->getSearchTypes() as $type )
			$search = $seFactory->create( $type );
			if ( !$search->supports( 'search-update' ) ) {
				continue;
			}

			$normalTitle = $this->getNormalizedTitle( $search );

			if ( $this->getLatestPage() === null ) {
				$search->delete( $this->id, $normalTitle );
				continue;
			} elseif ( $this->content === null ) {
				$search->updateTitle( $this->id, $normalTitle );
				continue;
			}

			$text = $this->content !== null ? $this->content->getTextForSearchIndex() : '';
			$text = $this->updateText( $text, $search );

			# Perform the actual update
			$search->update( $this->id, $normalTitle, $search->normalizeText( $text ) );
		}
	}

with

	/**
	 * Perform actual update for the entry
	 */
	public function doUpdate() {
		$services = MediaWikiServices::getInstance();
		$config = $services->getSearchEngineConfig();

		if ( $config->getConfig()->get( 'DisableSearchUpdate' ) || !$this->id ) {
			return;
		}

		$seFactory = $services->getSearchEngineFactory();
		foreach ( $config->getSearchTypes() as $type )
			if ( $this->title->mNamespace >= 0 ) {

			$search = $seFactory->create( $type );
			if ( !$search->supports( 'search-update' ) ) {
				continue;
			}

			$normalTitle = $this->getNormalizedTitle( $search );

			if ( $this->getLatestPage() === null ) {
				$search->delete( $this->id, $normalTitle );
				continue;
			} elseif ( $this->content === null ) {
				$search->updateTitle( $this->id, $normalTitle );
				continue;
			}

			$text = $this->content !== null ? $this->content->getTextForSearchIndex() : '';
			$text = $this->updateText( $text, $search );

			# Perform the actual update
			$search->update( $this->id, $normalTitle, $search->normalizeText( $text ) );
		}
	}

It seems it'd make more sense to do that check inside rebuildtextindex.php instead...

Change 644641 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/core@master] Don't try and run SearchUpdate on a page_namespace below 0

https://gerrit.wikimedia.org/r/644641

Do you actually have any rows in the page table with a page_namespace < 0?

SELECT * FROM page WHERE page_namespace < 0;

Apparently, yes, two records with [[Special:Contributions/...]] (namespace -1) did make it into Ansaikuropedia's page table. They look to have been lurking since 2008:

INSERT INTO `page` (`page_id`, `page_namespace`, `page_title`, `page_restrictions`, `page_is_redirect`, `page_is_new`, `page_random`, `page_touched`, `page_latest`, `page_len`, `page_content_model`, `page_links_updated`, `page_lang`) VALUES
(55353, -1, 'Contributions/ええええええええええ!', '', 1, 1, 0.521292954657, '20081214085646', 459493, 69, NULL, NULL, NULL),
(55354, -1, 'Contributions/Road', '', 1, 0, 0.3926725406, '20081214085653', 459513, 39, NULL, NULL, NULL);

No idea why they're there (and this is the only database on which I've seen them, out of dozens of other wikis recently upgraded) but they're there.

I'm imagining some weird edge case... Or a bug in MW at some point

Do they have revisions too?

SELECT * FROM revision WHERE rev_page IN ( 55353, 55354 );

Yup.

INSERT INTO `revision` (`rev_id`, `rev_page`, `rev_comment_id`, `rev_actor`, `rev_timestamp`, `rev_minor_edit`, `rev_deleted`, `rev_len`, `rev_parent_id`, `rev_sha1`) VALUES
(459493, 55353, 0, 0, '20081013210955', 0, 0, 69, 0, 'lemi0ny6p5acxqssd87ahm1twc9cc9e'),
(459512, 55354, 0, 0, '20081018123547', 1, 0, 39, 0, 'tonf72akeqr5s3vlmt52vbspwvxtbq0'),
(459513, 55354, 0, 0, '20081018123730', 1, 0, 39, 459512, 'iruwc1n44iplke21wke70xxopg9sxmy');

I'm curious what's in those pages...

https://ansaikuropedia.org/index.php?curid=55353
https://ansaikuropedia.org/index.php?curid=55354

MW won't say :(

Probably worth deleting them, somehow...

Change 675165 had a related patch set uploaded (by Krinkle; author: Reedy):
[mediawiki/core@REL1_35] maintenance: Don't create SearchUpdate in rebuildtextindex.php for page_namespace below 0

https://gerrit.wikimedia.org/r/675165

Change 644641 merged by jenkins-bot:
[mediawiki/core@master] maintenance: Don't create SearchUpdate in rebuildtextindex.php for page_namespace below 0

https://gerrit.wikimedia.org/r/644641

Reedy claimed this task.
Reedy triaged this task as Low priority.

Change 675165 merged by jenkins-bot:
[mediawiki/core@REL1_35] maintenance: Don't create SearchUpdate in rebuildtextindex.php for page_namespace below 0

https://gerrit.wikimedia.org/r/675165