Page MenuHomePhabricator

MediaWiki REST API Search results should resolve redirects
Closed, ResolvedPublic5 Estimated Story PointsBUG REPORT

Description

The issue also impacts the old legacy opensearch API, ( T296225) however that can be fixed by changing query parameters.

The API https://en.wikipedia.org/w/rest.php/v1/search/title?q=Tale%20of%20two%20cities&limit=10 returns redirect titles rather than resolving them.

Description

In new vector we introduced thumbnails and article descriptions for the search autocomplete results. For pages with redirects we show the redirect in the autocomplete results, rather than the page itself, which makes for a somewhat confusing experience because the redirects don't have thumbnails or descriptions. For example:

what I seewhat I expect to see
image.png (302×1 px, 57 KB)
image.png (302×1 px, 85 KB)
image.png (434×1 px, 121 KB)
image.png (434×1 px, 150 KB)
image.png (1×1 px, 426 KB)
image.png (1×1 px, 544 KB)

Event Timeline

Pchelolo renamed this task from RESTBase Search results should resolve redirects to MediaWiki REST API Search results should resolve redirects.Nov 29 2021, 8:04 PM

Just for consideration, there are some examples of redirects that may not be obvious or may look weird. In the examples below where A → B, starting to type A would result in autocomplete text and images for B.

Thelma Riley → Ozzy Osbourne

Corn → Maize

JFK → John F. Kennedy

Москва → Moscow

தமிழ் → Tamil Language

→ ↑ → [redirects to] Tsk Tsk Tsk

The Free Encyclopedia → Wikipedia

Anakin Skywalker → Darth Vader

I don't think this is necessarily a problem, and there may be ways of making it less potentially confusing in these cases: e.g. when there is a redirect, suggesting something like "Thelma Riley (Ozzy Osbourne)"

I think adding an additional property to each result in https://en.wikipedia.org/w/rest.php/v1/search/title?q=Tale%20of%20two%20cities&limit=10 would allow clients to handle this without confusion.

So instead of

{
  "pages": [
    {
      "id": 1516162,
      "key": "Tale_of_two_cities",
      "title": "Tale of two cities",
      "excerpt": "Tale of two cities",
      "description": null,
      "thumbnail": null
    }
  ]
}

In this case matchedtitle would always be the resolved redirect (and often identical to "title").

{
  "pages": [
    {
      "id": 1516162,
      "key": "Tale_of_two_cities",
      "title": "A Tale of Two Cities",
      "matchedtitle": "Tale of two cities",
      "excerpt": "Tale of two cities",
      "description": null,
      "thumbnail": null
    }
  ]
}
sdkim triaged this task as Medium priority.Dec 9 2021, 5:10 PM
sdkim set the point value for this task to 5.
sdkim moved this task from Incoming to Must do now on the API Platform board.

Change 753810 had a related patch set uploaded (by Nikki Nikkhoui; author: Nikki Nikkhoui):

[mediawiki/core@master] WIP: Add matched_title field to /search/page results

https://gerrit.wikimedia.org/r/753810

Change 753810 merged by jenkins-bot:

[mediawiki/core@master] Add matched_title field to /search/page results

https://gerrit.wikimedia.org/r/753810

A few additional pieces of information that might be of use, to perhas consider not using the destination title, or at least not as the link target.

  1. Wikipedians spend much time establishing and maintaining in great detail redirects to specific sections.

For example, where the subject doesn't have its own article, and is covered within a larger one. Such as:

  • TRANSIENT to ECHELON#Confirmation_(2015).
  • recdns to Name_server#Recursive_query.
  • Mirall to Allen_Institute.
  • .work to List_of_Internet_top-level_domains#W.

As such, it may be important for the destination URL to either be the original unchanged redirect, or the fully formed destination URL. I don't know the REST API code well, but from a quick glance it seemed as if maybe it is resolving to a page identity, and not a link target. Which means these sections would be lost.

  1. Vandalism and caching.

Ain't no sunshine without some cachin'. When redirects are changed, we purge these from the CDN so that subsequent visits go to the new/restored/corrected destiation. However, this only works if the redirects are in fact used.

Today, both Internet search engines like Google, as well as our own search in the canonical desktop experience, specifically do promote redirects as results and under the redirect name.

Screenshot 2022-02-08 at 23.44.13.png (149×863 px, 25 KB)

If vandalism results in ta search phrase now redirecting to something.. shocking, we would not want the search results for that word to continue linking to said shocking content for many minutes or hours.

The OpenSearch API as used by Vector addresses this by not resolving the redirect in the API, but letting it happen at pageview time, the same as it would happen when navigating article links, interwiki links, externral search results, etc.

Note that REST search, like OpenSearch, is cached for multiple hours. However this seems somewhat dependent on it not resolving redirects.

  1. Wikipedia usse redirects to help discover a concept under a more familiar name (either generally, or for specific regions).

As such, it may be important for this name to be visible. E.g. when typing "Famil" I am more likely to find what I am looking for via a result named "Familiar name" then "Seemingly unrelated name used in another country".

If its a hard product requirement to show the destination title, then perhaps both could be shown. VisualEditor does this currently. Though note that the API that VisualEditor uses is not subject to strong caching, and thus can afford to show the destination without a prolonged risk of promoting vandalism.

Screenshot 2022-02-08 at 23.23.53.png (476×802 px, 66 KB)

  1. Redirects have pageview statistics

Since we promote and encourage linking to redirects from external search engines, and in our own canonical search experience, and from within articles; This means researchers and the community more generally may use pageview stats to understand the use and utility of different redirects.

The most popular pageview tool for Wikimedia includes special affordances just for this.

How can I test this? It doesn't seem like beta has the complete set of search results to work with so it's difficult to tell what would happen in production. Also, there have been some questions raised by @MPhamWMF and @Krinkle. What is the expectation around who is owning these decisions?

Just for consideration, there are some examples of redirects that may not be obvious or may look weird. In the examples below where A → B, starting to type A would result in autocomplete text and images for B.

Thelma Riley → Ozzy Osbourne

Corn → Maize

... [clipped text]

I don't think this is necessarily a problem, and there may be ways of making it less potentially confusing in these cases: e.g. when there is a redirect, suggesting something like "Thelma Riley (Ozzy Osbourne)"

Good point @MPhamWMF. This is similar to what @TJones suggested here (T296225#7645702). It would be nice if we could do something like this, e.g.:

image.png (502×1 px, 221 KB)

however I imagine it would get complicated to know when we want to show this info, because I don't think we'd want this for example:

image.png (516×1 px, 234 KB)

@Krinkle there's a lot of residual effects that i was totally unaware of, thank you for bringing these up.

  1. I tested locally with an example that redirects to a page heading and youre totally right it resolves only to the page title, not to the section (LinkTarget) and that is a regression that i agree is a problem.

If 2. and 4. are reliant on the user actually clicking on the redirect source and having MediaWiki do the redirection, if the search result suggestion text was from redirect target (A Tale of Two Cities) but the url went to the redirect source (Tale of Two Cities, and then consequently did the redirect per usual) would that solve #2 and #4? Or is that just making it too complicated and unintuitive you think.

How can I test this? It doesn't seem like beta has the complete set of search results to work with so it's difficult to tell what would happen in production

Could we just create some pages in beta with some examples that @MPhamWMF gave?

Could we just create some pages in beta with some examples that @MPhamWMF gave?

Yes ^

I've added:

  • Thelma Riley
  • Ozzy Osbourne
  • The Beach Boys
  • Beach boys
  • Tale of two cities
  • A Tale of Two Cities

FWIW regarding @alexhollender design, I think we wouldn't need any change to the API as it is currently written (we can check matched_title against title). I can open a new ticket against Codex / WVUI to make the necessary design change.

thanks for doing that @Jdlrobson

I'm seeing somewhat mixed results, but again not sure what might be related to the beta environment vs. the code itself?

Thelma Riley
Screen Shot 2022-02-14 at 5.56.13 PM.png (292×691 px, 45 KB)
Screen Shot 2022-02-14 at 5.56.23 PM.png (256×635 px, 40 KB)
goodgood
tale of two cities
Screen Shot 2022-02-14 at 5.51.42 PM.png (303×793 px, 53 KB)
Screen Shot 2022-02-14 at 5.52.33 PM.png (271×750 px, 41 KB)
goodunexpected
beach boys
Screen Shot 2022-02-14 at 5.54.29 PM.png (300×710 px, 42 KB)
Screen Shot 2022-02-14 at 5.52.45 PM.png (348×749 px, 42 KB)
unexpectedunexpected

@Jdlrobson I'm not sure what should happen next?

DA to review and determine next steps

@alexhollender_WMF
hm this is really odd. I'm getting different results for the all of the "unexpected" screenshots.

Screen Shot 2022-02-24 at 10.28.10 AM.png (402×1 px, 127 KB)

Screen Shot 2022-02-24 at 10.29.34 AM.png (294×1 px, 91 KB)

Screen Shot 2022-02-24 at 10.29.26 AM.png (300×1 px, 86 KB)

@alexhollender_WMF
hm this is really odd. I'm getting different results for the all of the "unexpected" screenshots.

Ok I'm now seeing what you're seeing. Maybe there was some kind of caching or importing delay?

tale of two cities
Screen Shot 2022-02-24 at 12.17.26 PM.png (190×673 px, 39 KB)
Screen Shot 2022-02-24 at 12.17.17 PM.png (191×679 px, 39 KB)
goodgood
beach boys
Screen Shot 2022-02-24 at 12.16.46 PM.png (186×685 px, 36 KB)
Screen Shot 2022-02-24 at 12.16.56 PM.png (190×692 px, 37 KB)
goodgood

Sounds like a logical enough hypothesis to me ! haha. glad its as expected now

@Jdlrobson it seems like this is now live in production. Should we resolve the task?

Corn
Screen Shot 2022-03-03 at 6.46.45 PM.png (251×546 px, 43 KB)
Thelma Riely
Screen Shot 2022-03-03 at 6.47.58 PM.png (171×543 px, 26 KB)
tale of two cities
Screen Shot 2022-03-03 at 6.48.08 PM.png (166×546 px, 24 KB)

(screenshots from production English Wikipedia)

LGTM thanks for working on this @nnikkhoui !