Page MenuHomePhabricator

Discernatron should remove redirects from result set
Closed, DeclinedPublic

Description

In query 138 (https://discernatron.wmflabs.org/query/id/138), there are duplicates in the result set, e.g.

https://en.wikipedia.org/wiki/Yajna_Nrisimha_Temple and https://en.wikipedia.org/wiki/Narasimha_Temple,_Puri

Since duplicates are not shown to users in the search results, we should probably also not show them in the Discernatron.

Event Timeline

debt triaged this task as Low priority.Sep 22 2016, 10:20 PM
debt moved this task from needs triage to Up Next on the Discovery-Search board.
debt subscribed.

This is partially fixed (we're tracking the redirects) and both results wouldn't actually show up in real life. We'll take a look further into this.

If users find this horribly annoying, we could dedupe results, but I'd prefer not to.

It's possible for the same result to get a different score with different titles. I'd like to be able to mine the data to see how often that happens, and whether it follows any obvious patterns, and what it tells us about how we display results.

A good example right now is "Aimee Osbourne" (daughter of Ozzy Osbourne). There's a redirect from Aimee to Ozzy, but CirrusSearch only displays the title "Ozzy Osbourne" in the results (I think because of the partial match in the One True Title). Google, however, shows the title as "Aimee Osbourne". It's the same page in both cases, but I bet people are at least somewhat more likely to click on the obvious exact title match.

Another example: searching for "corn" returns "maize" (with a redirect from "maize corn"). I used to think "maize" was a kind of corn, not just another name for "corn", so old me might be less likely to click on even "maize corn" than plain "corn".

Mining for other examples and seeing how people rate them would be useful. I'm hoping I can use it to convince us that we should always show exact redirect title matches if we have them, and maybe more generally work on scoring and sorting redirect title matches.

Or it could turn out to not be consistently different, in which case I might drop the matter (though maybe not).

I think this would be really good stuff to take a look at, @TJones !

Moving to later this quarter....just not a huge priority right now.

I don't think Discernatron is used anymore, so this probably isn't a useful feature request anymore.