Page MenuHomePhabricator

More robust handling for word order in search terms
Closed, DeclinedPublic

Description

Description

The order of the words in the search term affects the results that are displayed. In some cases I imagine this is helpful/expected. In other cases it seems unhelpful. For example:

Example 1) There is an album titled "Bleach" by the band Nirvana. If I search Bleach nirvana it comes up. If I type Nirvana bleach it does not come up.

Bleach nirvanaNirvana bleach
Screen Shot 2022-04-21 at 5.05.50 PM.png (186×563 px, 29 KB)
Screen Shot 2022-04-21 at 5.05.59 PM.png (111×543 px, 15 KB)

Example 2) If I am looking for information about wildfires in California I might search California wildfires, or wildfires California. The second search term only shows one result:

California wildfireswildfires California
Screen Shot 2022-04-21 at 5.06.11 PM.png (421×545 px, 83 KB)
Screen Shot 2022-04-21 at 5.06.23 PM.png (180×543 px, 28 KB)

Event Timeline

@TJones can correct me if/where I'm wrong.

First, this looks like autocomplete, which does not perform a search (mostly for performance reasons), but instead is trying to prefix match existing article titles -- this makes it sensitive to word order. I suspect that this partially hidden due to redirects from your alternative queries, and an earlier change we made that used the redirect target article's title as the displayed title.

If you use special:search to do a full text search for both variations (by clicking the bottom "Search for pages.." option, you'll see that running an actual full text search better compensates for word order, and both queries in both examples return the same list of results.

cc: @Sneha , another note that it's completely opaque to almost all users that there are two different search bar functions that get conflated: autocomplete (not a real search), and full text search

I don't think for performance reasons we can/want to run any kind of full text search in the go bar, so this will likely be a declined task, or low priority one, unless there are other solutions I'm not thinking of.

+1 to what @MPhamWMF said. There's a very large, high-performance data structure that is held in memory so we can return completion suggestion results super quickly (every keystroke is another search). It also can handle a limited number of typos.

Doing more expensive searches, like the regular full text search, for every keystroke is not possible with the hardware we have—it would likely bog down our servers and be too slow to keep up with each keystroke on the user's side.

Incorporating additional data to handle out-of-order titles would make that data structure way too large to handle; there are six possible orders just for nirvana, bleach, and album. Having the right one or two alternatives curated as redirects is much more tractable.

Rolling over to Special:Search isn't a failure! The completion suggester is a shortcut that helps a lot of people get to the page they want, especially when they know or can guess the title, but it can't quite do everything.

@MPhamWMF @TJones ah ya, that all makes sense. Thanks for the clarification. Feel free to decline this task.