Page MenuHomePhabricator

Include display titles in MW REST API responses
Open, Needs TriagePublic

Description

The schema for the search API mentions:

titlePage title as it appears on the page

This seems to imply that this is the display title. But this does not seem to be the case. If it was the display title then a query for book titles, TV series, etc. would have some HTML tags in there.

Example 1: with the following query I would expect <i>The Rebel</i> (South Korean TV series) for the title, not just The Rebel (South Korean TV series).

https://en.wikipedia.org/w/rest.php/v1/search/title?q=the%20rebel&limit=1

{
    "pages": [
        {
            "id": 52294167,
            "key": "The_Rebel_(South_Korean_TV_series)",
            "title": "The Rebel (South Korean TV series)",
            "excerpt": "The Rebel (South Korean TV series)",
            "description": "television series",
            "thumbnail": {
                "mimetype": "image/jpeg",
                "size": null,
                "width": 133,
                "height": 200,
                "duration": null,
                "url": "//upload.wikimedia.org/wikipedia/en/thumb/8/85/Poster_for_Thief_Who_Stole_the_People.jpg/133px-Poster_for_Thief_Who_Stole_the_People.jpg"
            }
        }
    ]
}

Example 2: should be iOS 11 instead of IOS 11.

https://en.wikipedia.org/w/rest.php/v1/search/title?q=IOS_11&limit=1

{
  "pages": [
    {
      "id": 53862244,
      "key": "IOS_11",
      "title": "IOS 11",
      "excerpt": "IOS 11",
      "description": "eleventh major release of iOS, the mobile operating system by Apple Inc.",
      "thumbnail": {
        "mimetype": "image/png",
        "size": null,
        "width": 112,
        "height": 200,
        "duration": null,
        "url": "//upload.wikimedia.org/wikipedia/en/thumb/9/9e/IOS_11_Homescreen_iPhone_7_Plus.png/112px-IOS_11_Homescreen_iPhone_7_Plus.png"
      }
    }
  ]
}

For comparison, the RESTBase /page/summary endpoint returns three titles: canonical, normalized, display.

[1] https://www.mediawiki.org/wiki/API:REST_API/Reference#Schema

Event Timeline

Restricted Application added a subscriber: revi. · View Herald TranscriptAug 3 2020, 6:40 PM
apaskulin subscribed.

Thanks for opening this! I was just wondering about this. Since this isn't the display title, we should find another way to describe the property. I'm also curious about where this comes from in the code; it will probably help us describe it more accurately in the docs.

You can see the difference in https://github.com/wikimedia/mediawiki/blob/master/includes/title/TitleFormatter.php#L68-L80

Basically, there's 3 different ways of representing the title:

  • dbkey: The_Rebel_(South_Korean_TV_series)
  • text: The Rebel (South Korean TV series) - the only difference is spaces vs underscores. Really. https://github.com/wikimedia/mediawiki/blob/master/includes/Title.php#L600 - this is a standard title for display
  • display: <i>The Rebel</i> (South Korean TV series) - on some wikis a special parser directive DISPLAYTITLE is enabled, that allows editors to override the display title for the page, including HTML tags etc. This is stored in page properties.

In REST API we're including dbkey and text forms and ignoring the display form. I think we should include the display form, cause it should be preferred to text form if available. I think we should include the display form in the response for all endpoints, we just need to decide how.

  • We keep title key and prefer display form if it exists to text form. In my opinion the text form is fairly useless, maybe only for the clients who do not support HTML but still want to show the title, and it's really easy to convert into from the dbkey form.
  • Add a new display_title property that would contain the display form and fallback to text form.

I guess the decision here would be based on whether we think the pure text form is valuable enough. Also, optionally including HTML tags into the property will be backwards incompatible for the clients, but given there's no known clients yet, I think we can let this slide.

This is why I originally was pushing so hard for the title property in the schemas to be an object, not a bunch of properties in the root of the response - we could have been compatible with what PCS and summary is doing. But, too late now.

Thanks, @Pchelolo! I've changed the description of the property in the docs from "Page title as it appears on the page" to "Page title in reading-friendly format".

@bearND Does that change work for you? Would it be helpful to include a note that specifically states that title is not the display title?

@bearND Does that change work for you? Would it be helpful to include a note that specifically states that title is not the display title?

I STRONGLY feel about this. The actual display title format should be included in the response and we should decide on how to include it.

I STRONGLY feel about this. The actual display title format should be included in the response and we should decide on how to include it.

Having a display_title property would be really helpful for developers who want to make sure that the title they're getting from the API is the same title that appears on the page. It would also help make the documentation less ambiguous by providing all three variations of title. I think the creation of this task demonstrates this pretty well.

I agree that the display title should be included in the response. This would especially help the cases where the first letter of the title is not capitalized, like iOS, eBay, ...
It's a change in the search user experience but I think that would be a very good one IMO.
Whether it's in a new field or changing title doesn't matter to me very much.

Between the two options, I would prefer changing to the title property to be the display title. But I don't have a strong preference

Works for me. @apaskulin once that change is deployed you may want to add to the docs that this property may include HTML tags.

Works for me. @apaskulin once that change is deployed you may want to add to the docs that this property may include HTML tags.

That's a great point. In that case, changing the existing property might constitute a breaking change, but I'll leave it up to the implementers

bearND renamed this task from Is title in search schema the display title? to Include display titles in MW REST API responses.Aug 7 2020, 4:51 PM