Page MenuHomePhabricator

Improve mysql search for files
Open, Needs TriagePublic

Description

I noticed that MySQL full text search has this nice optimisation to create better matches for the file namespace.......

if ( $ns === NS_FILE ) {
    $t = preg_replace( "/ (png|gif|jpg|jpeg|ogg)$/", "", $t );
}

Not exactly up to date :)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change #1040591 had a related patch set uploaded (by TheDJ; author: TheDJ):

[mediawiki/core@master] Don't hardcode the file extension in searchindex normalization

https://gerrit.wikimedia.org/r/1040591

TheDJ moved this task from Inbox to For-review on the User-TheDJ board.

@dcausse @Gehel As far as I can see, updateTitle is not implemented by CirrusSearch right, and thus a noop per the parent SearchEngine class ? If so, then i can safely modify this.

I do note that each SearchUpdate Job goes through multiple regexes on the title before it hands it to the noop, which is a tad wasteful

@dcausse @Gehel As far as I can see, updateTitle is not implemented by CirrusSearch right, and thus a noop per the parent SearchEngine class ? If so, then i can safely modify this.

Indeed CirrusSearch uses other events to trigger updates, reason is that it has to support the various ways the parsed content can be changed (mainly via LinksUpdate). So yes modifying this in core should not impact CirrusSearch.

I do note that each SearchUpdate Job goes through multiple regexes on the title before it hands it to the noop, which is a tad wasteful

I think that in the case of CirrusSearch this is optimized by returning false on SearchEngine::support( 'search-update' ) at https://gerrit.wikimedia.org/g/mediawiki/core/+/8e8093d627ec92fc7a1d9226dcab8135f09c25be/includes/deferred/SearchUpdate.php#82