Page MenuHomePhabricator

[Search] Bug: opensearch API doesn't default to resolving redirects
Open, Needs TriagePublic

Assigned To
None
Authored By
alexhollender_WMF
Nov 22 2021, 8:50 PM
Referenced Files
F34947768: image.png
Feb 10 2022, 6:46 PM
F34947762: image.png
Feb 10 2022, 6:46 PM
F34847845: Screen Shot 2021-12-06 at 10.02.49 AM.png
Dec 6 2021, 3:04 PM
F34789765: Screen Shot 2021-11-29 at 11.43.56 AM.png
Nov 29 2021, 7:48 PM
F34789767: Screen Shot 2021-11-29 at 11.44.58 AM.png
Nov 29 2021, 7:48 PM
F34762501: image.png
Nov 23 2021, 12:39 AM
F34762499: image.png
Nov 23 2021, 12:39 AM
F34762234: image.png
Nov 22 2021, 8:50 PM

Description

Search results show the title of the redirect page, not the actual page. These should be resolved by the API

Opensearch for autocomplete applies to current-Vector not new-Vector. (Everything else about this ticket is perfectly accurate.)

Current-Vector runs this API query for search autocomplete: https://en.wikipedia.org/w/api.php?action=opensearch&format=jsonfm&formatversion=2&search=grand%20buda&namespace=0&limit=10

If it was changed to add the redirects=resolve parameter like so: https://en.wikipedia.org/w/api.php?action=opensearch&format=jsonfm&formatversion=2&search=grand%20buda&namespace=0&limit=10&redirects=resolve

what I seewhat I expect to see
Screen Shot 2021-11-29 at 11.43.56 AM.png (220×634 px, 26 KB)
Screen Shot 2021-11-29 at 11.44.58 AM.png (310×426 px, 36 KB)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@DLynch can you add details either in a comment or the description regarding the nature of this issue, as well as the explanation of the search query you ran to demonstrate the issue?

alexhollender_WMF renamed this task from Search bug: opensearch API doesn't default to resolving redirects to [Search] Bug: opensearch API doesn't default to resolving redirects.Nov 22 2021, 10:29 PM

Opensearch for autocomplete applies to current-Vector not new-Vector. (Everything else about this ticket is perfectly accurate.)

Current-Vector runs this API query for search autocomplete: https://en.wikipedia.org/w/api.php?action=opensearch&format=jsonfm&formatversion=2&search=grand%20buda&namespace=0&limit=10

image.png (302×894 px, 34 KB)

If it was changed to add the redirects=resolve parameter like so: https://en.wikipedia.org/w/api.php?action=opensearch&format=jsonfm&formatversion=2&search=grand%20buda&namespace=0&limit=10&redirects=resolve

image.png (304×894 px, 35 KB)

The default could be changed, but the query the searchbox uses could also be amended to just include that parameter.

New-Vector uses the REST search API which has no ability (that I can see) to be told to follow redirects: https://www.mediawiki.org/wiki/API:REST_API/Reference#Autocomplete_page_title

Change 740695 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/core@master] Autocomplete search follow redirects

https://gerrit.wikimedia.org/r/740695

For discussion, untested, that patch is probably what'd be needed to make current-Vector follow redirects for autocomplete. I scoped it such that it won't spill over into any uses of SearchInputWidget, though it could be made even simpler if that's not a problem.

It won't do anything for new-Vector, which will require entirely separate development probably by the search team.

We're talking about 2 different bugs here. I've spun out a new ticket for the one in the description since the autocomplete search is not used in new Vector.
I've opened a bug for the REST Api at T296671

@DLynch @alexhollender , just to make sure I understand, the current behavior in the current Vector resolves redirects, but the new Vector does not? Is this ticket to bring behavior back to parity (this is what it seems like to me)? or is it asking for net new behavior? In the former case, it makes sense to fix this regression.

In the latter case, I just want to make sure that we are ok with potentially weird edge case behavior from resolving redirects for autocompletes. Re-posted examples from comments on connected ticket: https://docs.google.com/document/d/1HL54H8yYlGADEwX6mCh_q6Mdrfu1bxU0zsA95Gm0fmY/edit?usp=sharing

Can also try
Thelma Riley → Ozzy Osbourne
Corn → Maize
→ ↑ → [redirects to] Tsk Tsk Tsk
The Free Encyclopedia → Wikipedia

@DLynch @alexhollender , just to make sure I understand, the current behavior in the current Vector resolves redirects, but the new Vector does not? Is this ticket to bring behavior back to parity (this is what it seems like to me)? or is it asking for net new behavior? In the former case, it makes sense to fix this regression.

as far as I can tell this issue is present in legacy Vector, new Vector, and all other skins. I didn't check the portal previously, but as you point out in the google doc the portal does not seem to have the issue (which seems like good news 🙂).

Screen Shot 2021-12-06 at 10.02.49 AM.png (362×676 px, 63 KB)

To confirm, the current behavior of the header search autocomplete does not resolve redirects regardless of skin. (Unless some skin out there decided to not rely on the automatic behavior from core's searchSuggest -- since new-Vector has reimplemented that, it's possible other skins did as well 🤷🏻‍♂️.)

This ticket requests a change in behavior, on the theory that showing the page you'll actually be taken to is probably-good. (As does T296671, it's just that it requires a more-complicated change to a separate system for new-Vector's reimplementation of search autocomplete to be affected.)

Is this related to this significant problem with categorization we started noticing at Commons? - https://commons.wikimedia.org/wiki/Commons:Village_pump#Category_autocomplete_is_now_case_sensitive

No it is unrelated to this, the broken behavior you've seen is most likely related to another issue (T295478 and followups T295705, T296897). It is being fully addressed on the search cluster but the problem you raised should be fixed by now.

Moving this ticket to watching/waiting until it becomes higher priority for desktop refresh, as per discussion with @ovasileva

Hang on, this needs a lot more consideration before implementation as it will be actively undesirable, even harmful, in some circumstances. While no user is going to be confused about seeing "A Tale of Two Cities" when searching for "Tale of Two Cities", the same is not going to be true in every case - examples:

  • Bush TwinsGeorge W. Bush. Unless you know that this is a redirect to a section of the former president's article this is confusing and implies that George is a twin (he isn't).
  • Diaoyu IslandsSenkaku Islands. Unless you know that one is an alternate name for the other this is going to be very confusing (and given the sensitivity of place names in disputed territories there could be other issues too)
  • Cuba, OhioCuba (disambiguation), Dark Angel (album)Dark Angel. These will cause confusion as, as far as the reader is concerned, they have entered a precise search term - why is Wikipedia going to take them to an ambiguous title or a disambiguation page?

Readers will be wondering what have they done wrong and how can they get where they want to go? This is bad UX.
Other problems include:

  • Making it much harder to visit the redirect page itself - e.g. to categorise it, nominate it for discussion, retarget it, overwrite it with an article, etc., especially if there is no "redirect from" note on the page.
  • Obscuring the utility of redirects - if people do not travel via a redirect there is no record of it being used, making it much harder for editors to distinguish redirects that are useful from those that are not and removing a source of information used when considering the best target for a redirect. These will make it harder for readers to find the content they are looking for.

There is definitely a benefit to hiding some redirects in the search suggestions, but not all and likely not when the redirect matches the search string. Which redirects should be hidden and which should not is not something that can be done other than by humans. See T24251 for a proposed solution to this (note that this task will not resolve that problem in all cases, so the tickets should not be merged).

Hang on, this needs a lot more consideration before implementation as it will be actively undesirable, even harmful, in some circumstances. While no user is going to be confused about seeing "A Tale of Two Cities" when searching for "Tale of Two Cities", the same is not going to be true in every case - examples:

  • Diaoyu IslandsSenkaku Islands. Unless you know that one is an alternate name for the other this is going to be very confusing (and given the sensitivity of place names in disputed territories there could be other issues too)

I agree. One suggestion that has been made before, which I like, is to include the redirect in the suggestion to make it clearer what's happening.

For example, for a query of diaoyu isl you could have a suggestion like this:

  • Senkaku Islands (from Diaoyu Islands)

Similarly:

  • corn → Maize (from Corn)
  • thelma r → Ozzy Osbourne (from Thelma Riley)
  • bush tw → George W. Bush (from Bush Twins)

(With the exact layout, formatting, and wording left as an exercise for a UX Designer.)

We already do this with full-text search results: einstien. The top results are:

  • Albert Einstein (redirect from Albert Einstien)
  • Bose–Einstein condensate (redirect from Bose-Einstien condensate)
  • Albert Einstein Memorial (redirect from Einstien Sitting Statue)

We also already ignore cases where the redirect is a substring of the article it redirects to, so searching for tale of two cities gives this:

  • A Tale of Two Cities

Not

  • A Tale of Two Cities (redirect from Tale of Two Cities)

Which redirects should be hidden and which should not is not something that can be done other than by humans.

Automatically detecting small typos (Einstein/Einstien/Einsten) is doable for some writing systems, but not as easy for others. For example, in Chinese, a one-character difference could be equivalent to a one-word difference in English.

Also, to keep the mood light, note that there are some extreme edge cases that will test the limits of any UX design:

  • Lopado­temacho­selacho­galeo­kranio­leipsano­drim­hypo­trimmato­silphio­karabo­melito­katakechy­meno­kichl­epi­kossypho­phatto­perister­alektryon­opte­kephallio­kigklo­peleio­lagoio­siraio­baphe­tragano­pterygon (redirect from Lopadotemakhoselakhogameokranioleipsanodrimypotrimmatosilphiokarabomelitokatakekhymenokikhlepikossyphophattoperister-alektryonoptokephalliokigklopeleiolagōiosiraiobaphētraganopterýgōn)

I agree. One suggestion that has been made before, which I like, is to include the redirect in the suggestion to make it clearer what's happening.

While that is going to help in some situations it's not going to in all cases (search results also provide context to help readers understand, search dropdowns do not), and doesn't address the last two bullets at all.

I agree. One suggestion that has been made before, which I like, is to include the redirect in the suggestion to make it clearer what's happening.

For example, for a query of diaoyu isl you could have a suggestion like this:

  • Senkaku Islands (from Diaoyu Islands)

Similarly:

  • corn → Maize (from Corn)
  • thelma r → Ozzy Osbourne (from Thelma Riley)
  • bush tw → George W. Bush (from Bush Twins)

I agree with this approach and just quickly sketched something. Any ideas how to avoid showing this extra info in cases where it is redundant/unnecessary? For example:

this seems helpfulthis seems redundant/unnecessary
image.png (502×1 px, 221 KB)
image.png (490×1 px, 217 KB)

Any ideas how to avoid showing this extra info in cases where it is redundant/unnecessary?

It depends on how fancy you want to get...

  • The simplest approach would be to not show the redirect if the redirect is a substring of the title (lowercasing everything before comparing)
  • More normalization than just lowercasing could help, too, such as removing spaces and/or punctuation (then QAnon, (the title) and Q Anon and Q-Anon (the redirects) would all be equivalent
    • IIRC, Javascript doesn't support Unicode regex shorthand like \p{L} or \p{P}, so you'd have to be careful with non-Latin scripts
    • You also would have to make sure neither the redirect nor the title normalize to the empty string (e.g., ; is a redirect to Semicolon).
  • Not 100% sure about the other way around, but if the redirect is longer than the title but the title is a substring of the redirect it would probably be okay to not show the redirect.
  • Cases like Einstein/Einstien/Enstein are doable, but may be too expensive or complex? (My heuristic would be, if the first letter is the same and the other difference is only a swap or one-character addition/deletion/swap, then hide the redirect. The fastest way (O(n) instead of regular edit distance, which is O(n²)) I can think of to do this is a minimum length of 4 or 5, and then remove matching prefix and suffix and if the remainders are both length < 2, or both length==2 and one the reverse of the other. Like I said, maybe too complex—and it may behave oddly with CJK titles.

So, my recommendation? Remove spaces and dashes from the title and the redirect and lowercase both. If neither is the empty string and one is a substring of the other, then don't show the redirect. It won't be perfect, but it will help a lot.

For context, there has been various perspectives and reasons shared on related tickets such as T296671#7695667 that resolving redirects in this manner is inherently undesirable from a content and community perspective, even if it might seem appealing from a design perspective when taken out of context.

If the new search widget is developed such that it accomodates the relevant nuances in an adequate way, then that could still turn a useful net-benefit in its rich experirence.

I understand that at the time of filing this task, this information was not yet gathered. But, with this newly discovered information now available from the other task(s), it might make sense to maintain the legacy sugggest widget as returning explicit titles that exist, and not redirect targets, given that its interface inherently doesn't accomodate the needed context and information about how to handle that correctly.

I suppose this might also provide a reason to (not) switch skins :)

For context, there has been various perspectives and reasons shared on related tickets such as T296671#7695667 that resolving redirects in this manner is inherently undesirable from a content and community perspective, even if it might seem appealing from a design perspective when taken out of context.

Can you clarify what you see as the difference between the content and community perspective, and the design perspective? Also what you mean by "taken out of context"? The issues I raised were from the perspective of someone using search expecting to see a certain result but not seeing it.