After moving an page, search will no longer finds the article on Postgres
Closed, ResolvedPublic

Description

Author: gebhkla

Description:
If an article is moved the search will no longer find the article when searching for the content. Only the title will be matched.

This seems to be a Postgres related bug, because we can reproduce this only in Postgres based installations.


Version: 1.23.0
Severity: normal

Details

Reference
bz66650
bzimport set Reference to bz66650.
bzimport added a subscriber: Unknown Object (MLST).
bzimport created this task.Jun 16 2014, 8:28 AM

gebhkla wrote:

This bug is caused by the update function in includes/search/SearchPostgres.php.

function update( $pageid, $title, $text ) {
        ## We don't want to index older revisions
        $sql = "UPDATE pagecontent SET textvector = NULL WHERE old_id IN " .
                        "(SELECT rev_text_id FROM revision WHERE rev_page = " . intval( $pageid ) .
                        " ORDER BY rev_text_id DESC OFFSET 1)";

        $this->db->query( $sql );
        return true;
}

The subselect statement is missing a DISTINCT and must look like:

function update( $pageid, $title, $text ) {
        ## We don't want to index older revisions
        $sql = "UPDATE pagecontent SET textvector = NULL WHERE old_id IN " .
                        "(SELECT DISTINCT rev_text_id FROM revision WHERE rev_page = " . intval( $pageid ) .
                        " ORDER BY rev_text_id DESC OFFSET 1)";

        $this->db->query( $sql );
        return true;
}

This DISTINCT is necessary, because with every move a new entry in the revision table is generated, with the same rev_text_id.

I have verified the problem and the proposed solution.

The harder part is what to do about already corrupted data.

I can fix it by running this:

update pagecontent set textvector=to_tsvector(old_text) where
textvector is null and old_id in
(SELECT max(rev_text_id) FROM revision group by rev_page);

I don't like that solution, because it should invoke the ts2_page_text trigger to set textvector, rather than doing it manually. That way if someone customized the trigger, it still does the right thing. Usually I'd do a dummy update and let the trigger take care of it. But ts2_page_text detects dummy updates and doesn't populate textvector in that case. There are other solutions (create a temp table, or use a CTE to delete and reinsert the tuples) but they seem like more trouble than it is worth.

Does the fix adopted also have to work in PostgreSQL 8.3? That version of PostgreSQL is past its EOL, but MediaWiki does still list it.

Change 153565 had a related patch set uploaded by Jjanes:
PostgreSQL: Fix text search on moved pages

https://gerrit.wikimedia.org/r/153565

Change 191249 merged by jenkins-bot:
PostgreSQL: Fix text search on moved pages

https://gerrit.wikimedia.org/r/191249

Restricted Application added a project: Discovery. · View Herald TranscriptSep 24 2015, 10:30 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Deskana closed this task as "Resolved".Sep 24 2015, 4:44 PM
Deskana claimed this task.

Change 191245 merged by Umherirrender:
PostgreSQL: Fix text search on moved pages

https://gerrit.wikimedia.org/r/191245

Jdforrester-WMF added a subscriber: Jdforrester-WMF.

Migrating from the old tracking task to a tag for PostgreSQL-related tasks.

Restricted Application added a project: Discovery-Search. · View Herald TranscriptNov 2 2016, 8:16 PM