Page MenuHomePhabricator

File search only returns word matches
Closed, DuplicatePublic

Description

MediaWiki version: 1.31.1
VisualEditor version: 0.1.0 (13a585a)

Steps to reproduce:

  • Set up a foreign file repo with the following:
	$wgForeignFileRepos[] = [
		'class' => ForeignAPIRepo::class,
		'name' => 'somename',
		'apibase' => 'https://mysite.com/w/api.php',
		'url' => '/wiki/images',
	];
  • Upload a file called DeepPlane.jpg there
  • Search for DeepPlane in VisualEditor
  • Nothing shows up

What seems to happen is that only full word matches are returned, but the file extension is taken into account in the 'word'. For example a search for DeepPlane.jpg does return the file, while DeepPlane does not. The search behaviour seems rather close to splitting the file name by spaces/underscores or dashes and looking for exact matches in either of the words.

A file named Hello_There.jpg will show up when searching for Hello, but not for There. It will however show for There.jpg.

Hope I explained this clearly.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 24 2018, 6:18 PM
Restricted Application added a project: Discovery-Search. · View Herald TranscriptOct 26 2018, 5:48 PM
EBjune triaged this task as Normal priority.Nov 1 2018, 5:07 PM
EBjune added a subscriber: EBjune.

Search would like to hear from VisualEditor folks before knowing whether we can do anything to help here.

Esanders lowered the priority of this task from Normal to Low.Nov 6 2018, 4:56 PM
Esanders added a subscriber: Esanders.

This doesn't appear to be a problem on WMF wikis, so lowering priority. Maybe it depends on which search provider you use on your wiki?

As far as my understanding goes, WMF uses Extension:CirrusSearch. I can confirm that the issue is happening with the default search engine.

According to Help:Searching: "The search functionality can be considered to operate on whole words, separated by spaces or other punctuation marks."

If this is by design, maybe dots should be considered a punctuation mark too? This would allow files to be found from the last word in the filename.

See also T212502 for the specific use-case of copy-pasting an entire file name into VE's media search to insert that file.