Page MenuHomePhabricator

Fix documentation of boolean operators
Closed, ResolvedPublic

Description

The Help:Searching documentation for boolean search operators (AND, OR) describes how things might be expected to work, but it is far from the way things actually work.

In particular, parens are completely ignored (so (blue OR red) AND green and blue OR (red AND green) in fact give identical results) and logical operators are mapped in an unexpected way to Lucene MUST and SHOULD queries (so blue OR red AND green doesn't do either of the things you would expect it to do).

The current plan is to write a short bit of documentation for the Help page saying that parens are ignored, boolean AND and OR do not behave in the expected way and should probably not be used for the time being, and that the plan it to implement them properly in the future—along with a link to a longer page explaining how things actually work, including how queries are parsed and converted to MUST and SHOULD queries, explaining what those actually do, and expanding on our future plans to support proper boolean queries and explicit/intentional SHOULD queries.

mw:Help:CirrusSearch should also be reviewed and updated as necessary.

Once we properly support boolean queries and parens, we can change the Help:Searching docs back to what they are now, more or less, and put a big disclaimer on the more detailed page that it works like it is supposed to.

Event Timeline

Subscribing @Cpiral and @The_Transhumanist since they edit Help:Searching and it would be great if they could review the documentation we want to add before it goes live. Please subscribe any other contributors to Help:Searching who might be able to provide feedback, too. Thanks!

TJones triaged this task as Medium priority.

I've written a draft of the longer explanation (~1200 words) of the use of Logical operators in on-wiki search. Comments and suggestions are welcome!

(In my examples using MUST and SHOULD, I've treated them as unary operators on query terms because that's fairly straightforward. I certainly didn't want to get into the full complexities of Lucene queries.)

Also, here's the first draft of the update to go on the Help:Searching Logical operators section:

The search engine supports limited boolean logic in searches. Logical NOT (negation) can be indicated by a "-" (minus sign) or a "!" (exclamation point) character prefixed to a search term, or by the NOT keyword.
 
Parentheses (…) are ignored by the search engine and have no effect.
 

The operators AND and OR are used by the search engine, but do not have the expected boolean logical meaning and should be used with great care. See the additional documentation for an in-depth explanation.

I still need to review the rest of the Help:Searching page and mw:Help:CirrusSearch page to see if any other changes are needed there.

I've also sent a message to the Discovery mailing list to encourage additional feedback.

I added a note on the Talk page for Help:Searching to try to get more feedback.

I made a bunch of edits to the Help:Searching page and removed most references to AND and OR and logical operators in other sections, and made the update to the Logical operators section, except for the link to "additional documentation" so that it could be reviewed and moved to a better location.

The Help:Searching documentation for boolean search operators (AND, OR) describes how things might be expected to work, but it is far from the way things actually work.

@TJones: As that covers 1 random Wikimedia site, what's Discovery-Search 's plan for updating the approx. 100 other Help:Searching pages on other Wikimedia sites, like https://ru.wikipedia.org/wiki/Википедия:Поиск or https://sd.wikipedia.org/wiki/مدد:وڪيپيڊيا_۾_ڳولها or https://ka.wikipedia.org/wiki/ვიკიპედია:ძიება ?

@TJones: As that covers 1 random Wikimedia site, what's Discovery-Search's plan for updating the approx. 100 other Help:Searching pages on other Wikimedia sites, like https://ru.wikipedia.org/wiki/Википедия:Поиск or https://sd.wikipedia.org/wiki/مدد:وڪيپيڊيا_۾_ڳولها or https://ka.wikipedia.org/wiki/ვიკიპედია:ძიება ?

Good question. I'm not sure how to answer it. It seems that information percolates from the English documentation to other wikis (albeit very slowly). Obviously we can try to get the word out through through Tech News and the Tech Ambassadors. I'll follow up with folks in Community Relations and see if there's a good way to encourage the correct info to be spread faster.

Good question. I'm not sure how to answer it. It seems that information percolates from the English documentation to other wikis (albeit very slowly).

Isn't the English and canonical documentation on https://www.mediawiki.org/wiki/Help:CirrusSearch and translatable there? As I don't see you suddenly maintain 101+ help pages, I'd propose to define one canonical place (usually mediawiki.org, meta, wikitech), and to maintain and update that one single place only...

Isn't the English and canonical documentation on https://www.mediawiki.org/wiki/Help:CirrusSearch and translatable there? As I don't see you suddenly maintain 101+ help pages, I'd propose to define one canonical place (usually mediawiki.org, meta, wikitech), and to maintain and update that one single place only...

mw:Help:CirrusSearch is on my to-do list to edit once we have a final version of the longer explanation of the current behavior ready, and it does seem to be the place most often linked to on search pages on other Wikipedias.

I guess I started on en:Help:Searching because that was where I was when I first discovered the problem. I was in that weird intersection between being part of the WMF and being part of the volunteer community—the documentation I found as a user was wrong, and I also feel a professional obligation to verify it and correct it.

I reviewed the draft on mw.org, everything there looks accurate as far as I'm aware. I didn't realize that implicit and explciit AND behave differently. The on-wiki documentation doesn't feel scary enough for what's really going on, but I'm not sure how to make it more explicit that this thing is funny and not what they think it is.

Thanks for taking a look, Erik! I've tried to make it a tiny bit scarier by simplifying, bolding, and embiggening the pre-TOC text.

I'd appreciate any other feedback, but if I don't hear back from anyone else today, I think I'll move it to be a sub-page of the Cirrus help page and continue from there, since having in the draft location is blocking other documentation and blogging work.

I've moved the draft to Help:CirrusSearch/Logical_operators and updated Help:CirrusSearch to link to it. Help:CirrusSearch doesn't mention parens, so nothing to fix there. (I made some other incidental edits, too, particularly changing AND and OR to and and or when used for emphasis rather than as operators.)