Page MenuHomePhabricator

Bad search results when you don't include the preposition in your search query
Closed, DeclinedPublic

Description

If you are looking for information on e.g. the history of France (or Italy or many other countries) in Wikipedia and you just search "history france" you don't get, as you'd expect, the article "History of France" as the first result. It's not even within the 10 first results.

It's not always the case, but it seems that secondary words like 'of' (that many users just don't write in their searches) are being given by the Wikimedia's search engine more weight than they should. (Contrast that with Google for example).

The problem is similar in Wikis in other languages, like Spanish or French. On them, unless you write the preposition 'de' you may not get the results you expect. Examples: 'emisora (de) radio', 'historia (de la) música', 'guerra (de) afganistán', etc.

I can imagine thousands of users having problems to find certain articles because of this. Hope it can be fixed.

Event Timeline

Savig created this task.Oct 27 2015, 12:52 AM
Savig raised the priority of this task from to Needs Triage.
Savig updated the task description. (Show Details)
Savig added a project: MediaWiki-Search.
Savig added a subscriber: Savig.
Restricted Application added a project: Discovery. · View Herald TranscriptOct 27 2015, 12:52 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Aklapper set Security to None.
Deskana renamed this task from Bad search results when you don't write the preposition in your search query to Replace the prefixsearch used in the search box at the top right of pages with something better.Dec 30 2015, 9:38 PM
Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptDec 30 2015, 9:38 PM
Deskana closed this task as Invalid.Dec 30 2015, 9:41 PM
Deskana claimed this task.
Deskana added a subscriber: Deskana.

We're essentially trying this with the completion suggester. This is tracked in T121616, and related tasks. Having another task to track this work isn't that helpful, so I'm closing this one as invalid.

Savig added a comment.EditedDec 31 2015, 12:27 AM

I think the completion suggester will be a nice tool to help users, but I don't see how that is similar to solving the problem I described here, which is basically a big malfunction in wikimedia's search engine. The search engine should work well, regardlessly of the existence of a completion suggester. I'm worried about this ticket being closed without further action on that.

PS: To avoid any confusion, please note that when I say that the article "History of France" doesn't come up as the first result when you do a search of "history france" (as should be the case) I'm not talking about the prefixsearch results (the inmediate list of suggestions you get below the search box), as Deskana's comment seems to infer, but I'm talking about the search results you get from that search.

Please note that I opened this ticket to report NOT a problem with the prefixsearch, but a problem with Wikimedia's regular search results. Please see an example of an unsuccesful search in the image attached.

TTO renamed this task from Replace the prefixsearch used in the search box at the top right of pages with something better to Bad search results when you don't include the preposition in your search query.Dec 31 2015, 10:32 AM
TTO reopened this task as Open.
TTO added a subscriber: TTO.

Reopening this. I think Dan misunderstood the reporter's comment.

Reopening this. I think Dan misunderstood the reporter's comment.

Thanks. There wasn't much context for me to go on in the report, so I was left guessing.

Please see an example of an unsuccesful search in the image attached.

Thanks for the example. Can you give more? One example doesn't help us pin the problem down much.

TTO added a comment.Jan 1 2016, 10:13 AM

Some more examples from enwiki:

  • geography poland: "Geography of Poland" isn't on the first results page at all.
  • history mexico: "History of Mexico" is seventh when it almost certainly should be first.
  • A lot of searches for history X, where the article "History of X" exists, fail to show that article near the top of the search results.
  • keys kingdom: "Keys to the Kingdom" is second, "Keys of the Kingdom" is a little way down, "The Keys to the Kingdom" isn't on the first results page.

Probably many more. And as the reporter points out, it's not just a problem for enwiki.

It's even worse, when you put in the search terms in the opposite order, which is a common way for people to search: for example poland politics doesn't show "Politics of Poland" in the first results page.

Deskana removed Deskana as the assignee of this task.Feb 3 2016, 6:17 PM

Unassigning; this was assigned to me when I closed, and stayed that way by accident after this was reopened.

Restricted Application added a project: Discovery-Search. · View Herald TranscriptJul 7 2016, 12:12 PM
debt triaged this task as Normal priority.Jul 20 2016, 4:08 PM
debt moved this task from needs triage to This Quarter on the Discovery-Search board.
debt added a subscriber: debt.Aug 30 2016, 10:14 PM

Let's see if BM25 can help with this?

debt closed this task as Declined.Sep 8 2016, 10:12 PM

Closing this ticket as there isn't any work to be done directly on this ticket.
Because, this will be fixed once we launch BM25 - but things are looking much better now (when using BM25) as viewed in @EBernhardson
samples above

TTO added a comment.Sep 8 2016, 10:43 PM

That's not normally a reason to decline a ticket. The usual practice is to leave it open until the parent task is fixed, then close it as "Resolved". Main rationales for this are (a) when the parent task is fixed, we are reminded to check whether this task is indeed fixed; and (b) helps others to locate this issue and reduces the chance of duplicate tasks being filed.

debt added a comment.Sep 8 2016, 11:10 PM

HI @TTO - I closed it because there isn't any work to be done specifically on this ticket, as it's being worked elsewhere. If you'd like it to be open but not actually worked on and located in the backlog, that's fine with me.