Page MenuHomePhabricator

Enable search result ranking in MySQL full text search
Closed, ResolvedPublic

Description

MediaWiki search uses the IN BOOLEAN MODE modifier in the MATCH AGAINST clause. MySQL does not rank the results when the BOOLEAN modifier is used. I've improved the ranking significantly by adding the clause ORDER BY MATCH($field) AGAINST($searchon) DESC. This ranks the results without the BOOLEAN modifier.

In the function queryMain in the file includes/search/SearchMySQL.php, add the following entry to the $query array:

$query['options']['ORDER BY'] = substr($match, 0, -18) . ') DESC ' ;

Event Timeline

Runntb created this task.Apr 18 2018, 2:48 PM
Restricted Application added projects: Discovery, Discovery-Search. · View Herald TranscriptApr 18 2018, 2:48 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Reedy added a subscriber: Reedy.Apr 18 2018, 3:54 PM

Why would you use a sub string? Wouldn't it make more sense to change the SQL being returned by parseQuery?

		return " MATCH($field) AGAINST($searchon IN BOOLEAN MODE) ";

parseQuery generates the where clause for queryMain and getCountQuery. These need boolean mode to enable negation, etc. parseQuery could return both the where and order by clauses in an array. This would also make it easier to explicitly state that the order by clause is using IN NATURAL LANGUAGE MODE (which is implied when no mode is specified).

/**
 * Parse the user's query and transform it into two SQL fragments:
 * a WHERE condition and an ORDER BY expression
 *
 * @param string $filteredText
 * @param string $fulltext
 *
 * @return array
 */
function parseQuery{
    ...
    return [
        " MATCH($field) AGAINST($searchon IN BOOLEAN MODE) ",
        " MATCH($field) AGAINST($searchon IN NATURAL LANGUAGE MODE) DESC "
    ];
}
function queryMain {
    ...
    $query['conds'][] = $match[0];
    $query['options']['ORDER BY'] = $match[1];
}
function getCountQuery {
    $match = $this->parseQuery( $filteredTerm, $fulltext )[0];
    ...
}
EBjune triaged this task as Normal priority.Apr 26 2018, 5:18 PM
EBjune added a project: good first bug.
EBjune added a subscriber: EBjune.

It would be great if someone could put a patch together, the search platform team will be glad to review it.

Runntb added a comment.May 1 2018, 1:34 PM

Here is a patch file. As best I can tell, it works with all branches starting with REL1_30. Thanks.

@Runntb: Thanks for taking a look at the code!
You are very welcome to use developer access to submit the proposed code changes as a Git branch directly into Gerrit which makes it easier to review them quickly and provide feedback. If you don't want to set up Git/Gerrit, you can also use the Gerrit Patch Uploader. Thanks again!

Change 430254 had a related patch set uploaded (by Runntb; owner: Runntb):
[mediawiki/core@master] search: Add result ranking in MySQL

https://gerrit.wikimedia.org/r/430254

Change 430254 merged by jenkins-bot:
[mediawiki/core@master] search: Add result ranking in MySQL

https://gerrit.wikimedia.org/r/430254

debt added a subscriber: debt.

Let's take a look at this, now that a patch has been uploaded.

EBjune closed this task as Resolved.Jun 21 2018, 5:11 PM
EBjune claimed this task.

It's been tested and merged, looks like we can close this one, thanks!