Page MenuHomePhabricator

Searching for "corn" on desktop doesn't return "corn" as the first result
Closed, ResolvedPublic

Description

There's some API weirdness reported by @Mhurd from Reading.

If you search for "corn" on desktop, you don't get the "Corn" redirect in the results even though it's an exact match of what you've typed in. Every other query I've tried has worked fine, and I'm not aware of anything that we've touched that would change this, but it does seem odd and warrants further investigation.

Event Timeline

Deskana raised the priority of this task from to High.
Deskana updated the task description. (Show Details)
Deskana added subscribers: Deskana, Mhurd.
Deskana renamed this task from Searching for "corn" on desktop doesn't return "corn" as the first result - possible regression? to [Possible regression] Searching for "corn" on desktop doesn't return "corn" as the first result - possible regression?.Aug 27 2015, 6:41 PM
Deskana raised the priority of this task from High to Unbreak Now!.
Deskana set Security to None.

Needs urgent investigation to see whether this is nothing, or is indicative of a larger issue.

To be clear, the output I'm looking for is "Whoa, stuff's seriously broken, we need to drop everything!" or "This one query's broken, and it's a bit odd, we should fix it later".

Legoktm renamed this task from [Possible regression] Searching for "corn" on desktop doesn't return "corn" as the first result - possible regression? to [Possible regression] Searching for "corn" on desktop doesn't return "corn" as the first result.Aug 27 2015, 6:49 PM

https://en.wikipedia.org/w/api.php?action=opensearch&format=json&search=corn&namespace=0&limit=10&suggest=

It appears to ignore redirects.

Deskana lowered the priority of this task from Unbreak Now! to Medium.Aug 27 2015, 10:45 PM

The results of the analysis show that this isn't indicative of a larger issue as most other queries are working sensibly.

Deskana renamed this task from [Possible regression] Searching for "corn" on desktop doesn't return "corn" as the first result to Searching for "corn" on desktop doesn't return "corn" as the first result.Aug 27 2015, 10:45 PM

Redirects are not entirely excluded:
https://en.wikipedia.org/w/api.php?action=opensearch&format=json&search=obamma&namespace=0&limit=10&suggest=

https://en.wikipedia.org/w/api.php?action=opensearch&format=json&search=texass&namespace=0&limit=10&suggest=

But in those cases, the redirect doesn't have anything to compete with.

Corn is a weird case in which the redirect (Maize) doesn't look like the original search term, but the original term shows up in lots of other titles, as a whole word or as a prefix of a longer word.

Oddly, the top result, "corn economics" *is* a redirect back to Maize, too.

I searched a random set of 103 common words (taken from the list of 1000 common words on Wikipedia). A very small number didn't give themselves as the top result. Most of others were prefix string of what they redirect too (difficult / difficulty; laugh / laughter; learn / learning; similar / similarity). The one example that redirects to something very different was usually, which redirects "Convention (norm)". However, it really has little competition because the other suggestion is "Usually just a T-shirt" and an intitle:usually search gives only
one result: https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=intitle%3Ausually&fulltext=Search

So, I think prefixes must be preferred over redirects, and this is a weird corner case with the features I mentioned above: the search term "doesn't look like the original search term, but the original term shows up in lots of other titles, as a whole word or as a prefix of a longer word."

I went looking for other cases, looking for redirects with the same features, but couldn't come up with any.

Interestingly, suggesty gives Maize as a top result!:
http://suggesty.wmflabs.org/w/api.php?action=query&list=search&srsearch=corn&cirrusDumpResult

Oddly, the top result, "corn economics" *is* a redirect back to Maize, too.

I think you found the bug, cirrus indexes one doc per page and not one doc per page, redirect. So it needs to find the best redirect.
It uses the highlighter to do this, unfortunately the highlighter will take only the first 30 fragments that match the user query so I suppose corn is not in the 30 fragments or there's a problem when we decide which fragment is good.

Maize seems to have less than 30 redirects that match Corn so I suppose the problem is in the code that decides which fragment will be displayed.
Cirrus seems to send all possibilities and I don't know where this decision is made but it looks like the first one is chosen.
And here "corn economics" is the first redirect that matches "Corn" in the redirect array in https://en.wikipedia.org/wiki/Maize?action=cirrusDump

So it's not a regression, it's directly related to how the redirect array is sorted. This is something that we could try to enhance, I have to find where this decision is made...

It's in Cirrus (includes/Hooks.php:714) :

if ( isset( $match[ 'redirectMatches' ][ 0 ] ) ) {
        // TODO maybe dig around in the redirect matches and find the best one?
        $results[] = $match[ 'redirectMatches' ][ 0 ]->getPrefixedText();
}

We could take the shortest redirect instead?

I think you found the bug, cirrus indexes one doc per page and not one doc per page, redirect. So it needs to find the best redirect.

No, no, no. I just documented a bunch of behavior, and you found the bug. Thanks!

So it's not a regression, it's directly related to how the redirect array is sorted. This is something that we could try to enhance, I have to find where this decision is made...

I think this issue as stated is solved—we know what's happening and why, and it's a corner case, and nothing has regressed.

As @Deskana said, "This one query's ... a bit odd, we should fix it later".

@Deskana

So I noticed what may be the same issue with another enwiki search term:

"uk"

Screen Shot 2015-08-31 at 5.47.02 PM.png (378×580 px, 79 KB)

I'm almost certain this is a regression and formerly searching for "uk" would return something other than "ukia" as the top result.

Also, neither "uk" nor the actual "united kingdom" article title appear in the first 10 search results...

Alternatively, we could petition the United Kingdom to change its name to the Ukia Kingdom. @Deskana?

Speaking of ukia (whatever it is), if you search for it the exact match title isn't the first result...

Screen Shot 2015-08-31 at 5.56.08 PM.png (320×457 px, 69 KB)

Speaking of ukia (whatever it is), if you search for it the exact match title isn't the first result...

In that case it's probably correct behaviour; if you type "ukia" you're probably not looking for some obscure acronym nobody's ever heard of for the UK.

I think exact match should still be first. If it's not what you are looking for that's what all the results after the first one are for :)

Unfortunately this task lacks clear steps to reproduce (URL). Assuming this is about English Wikipedia then this seems to be resolved?

Screenshot from 2020-05-12 22-00-54.png (514×813 px, 101 KB)

TJones claimed this task.

@MaxSem did offer a link to test the problem above.

https://en.wikipedia.org/w/api.php?action=opensearch&format=json&search=corn&namespace=0&limit=10&suggest=

It appears to ignore redirects.

I tested all the cases here, and they all work as one would like, so I'm closing this. Thanks @Aklapper for bringing it to our attention again!