Page MenuHomePhabricator

OR keyword should be usable with special/advanced syntax
Closed, DuplicatePublic

Description

Since MediaWiki 1.28, CirrusSearch allowes to search for files with a give Type of file or Mimetype with the keywords filetype: and filemime: (→ T145560).

It is possible to exclude filetypes/filemimes from a search with the exclusion syntax -filetype:audio but it is not possible to combine two filetypes/filemimes alternatively with the 'OR' keyword also provided by CirrusSearch.

Example:
Searching for [[ https://commons.wikimedia.org/w/index.php?search=filemime%3Aimage%2Fpng+OR+filemime%3Aimage%3Atiff&title=Special:Search | filemime:image/png OR filemime:image:tiff ]] should search for both files in png-Format as well as files in tiff-format, instead it return nothing.

Event Timeline

Other trivial examples include https://commons.wikimedia.org/wiki/Special:Search?search=sporre+OR+filemime:image/tiff. This does not run a search with an OR but with an AND (the default). The word "OR" is either ignored or used as a search term.

https://commons.wikimedia.org/wiki/Special:Search?search=filemime:image/tiff+OR+sporre even fails with an error message that's probably not correct.

This is a much larger, separate thing that basically means re-writing the query parsing in CirrusSearch. This is currently a tentative goal for Q4. The problem is that pieces of the query (ex: filetype:xyz) are ripped out with regex matches and turned into other things, the handling of AND, OR, etc. doesn't happen until a second query parser inside elasticsearch looks at things. CirrusSearch needs to have a grammer based query parser built that understands the and/or/etc and can built appropriate boolean queries.

Deskana renamed this task from Filetype and Filemime should allow to be used together with the OR-Keyword to OR keyword should be usable with special/advanced syntax.Dec 8 2016, 11:19 PM
Deskana triaged this task as Medium priority.
Deskana added a subscriber: Deskana.

As @EBernhardson mentions, this is a larger problem. I've retitled the task to account for that.

Sadly we're not really equipped to tackle this problem in Q4 like we thought we were. I'm hopeful we can get back to it at some point, but it won't be for at least three months.