Page MenuHomePhabricator

Return displaytitle in the correct language variant from /page/summary
Closed, ResolvedPublic

Description

Background information

/page/summary doesn't return displaytitle in the correct language variant

How

It looks like action=parse&prop=displaytitle returns the correct displaytitle.

For example, with Accept-Language: en, zh-Hans; q=0.9, zh-Hant; q=0.8 header set for all requests listed

https://zh.wikipedia.org/api/rest_v1/page/summary/中國
incorrectly returns 中國

https://zh.wikipedia.org/w/api.php?format=json&titles=中國&action=query&ppprop=displaytitle&prop=pageprops
incorrectly returns 中國

https://zh.wikipedia.org/w/api.php?format=json&page=中國&action=parse&prop=displaytitle
correctly returns 中国

Open questions

Could we add a request to action=parse&prop=displaytitle to get the displaytitle with the correct formatting and in the correct language variant based on the Accept-Language header?

Acceptance criteria

Event Timeline

LGoto triaged this task as Medium priority.Jul 3 2019, 3:53 PM

Brace yourself for a deep dive on MediaWiki:

The displaytitle pageprop sounds pretty intuitive and generic but actually means something non-intuitive and highly specific: it does not refer generally to the title displayed, but specifically refers to the usage of the DISPLAYTITLE magic word to enforce a specific title formatting that differs from the canonical DB title formatting, in which case a special displaytitle value for the page is inserted into the wiki's page_props table. Most pages don't have displaytitle pageprops, and indeed 中國 doesn't either. (Actually, after fishing for a while with generator=random I couldn't find any pages with displaytitle pageprops on zhwiki, I guess because it's probably inappropriate in most cases on a wiki with multiple language variants).

Nonetheless, ApiMobileView does return a value that it calls displaytitle in all cases, which is something other than the displaytitle pageprop, and we've followed suit in some of the REST API endpoints. I think this drift in terminology is confusing and was a mistake in both cases. (Granted, MediaWiki's using displaytitle to mean something this specific and technical was probably not a great choice, but it's been around more or less forever and surely isn't changing now.)

The good news is that what you want is already available via both the Action API and the REST API. (I actually implemented it myself long enough ago that I'd forgetten about it!)

https://zh.wikipedia.org/w/api.php?action=query&format=json&prop=info&titles=%E4%B8%AD%E5%9C%8B&inprop=varianttitles
https://zh.wikipedia.org/api/rest_v1/page/metadata/%E4%B8%AD%E5%9C%8B (see the variants key)

Because of the potential for further confusion, I don't know that I'd recommend updating the displaytitle field in the /page/summary response to depend on the accept-language header sent. Actually, I think we should deprecate it. But I wouldn't be opposed to moving the titles by variant into /page/summary.

Actually, hard-deprecating displaytitle in the REST API is going too far, but I do think we should update it so that it's used strictly for the displaytitle pageprop, if and only if one exists.

But I wouldn't be opposed to moving the titles by variant into /page/summary.

Could this be a new property that depends on the Accept-Language header and will return the displaytitle for non-variant wikis? On enwiki, the varianttitles aren't formatted properly - for example with https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info&titles=Iphone&inprop=varianttitles and https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info&titles=Felis&inprop=varianttitles . It would be nice to consolidate displaytitle and varianttitles fallback logic into a single property so the clients just send Accept-Language and don't have to know the language code fallback map or pick between the two title options.

Actually, hard-deprecating displaytitle in the REST API is going too far, but I do think we should update it so that it's used strictly for the displaytitle pageprop, if and only if one exists.

Should verify that clients don't depend on the property existing with no fallback logic to title - iOS doesn't depend on it CC @Dbrant for Android

It occurs to me that the Reading Web crew should get a say here, too, since Page Previews are probably the largest consumer of /page/summary.

But I wouldn't be opposed to moving the titles by variant into /page/summary.

Could this be a new property that depends on the Accept-Language header and will return the displaytitle for non-variant wikis? On enwiki, the varianttitles aren't formatted properly - for example with https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info&titles=Iphone&inprop=varianttitles and https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info&titles=Felis&inprop=varianttitles . It would be nice to consolidate displaytitle and varianttitles fallback logic into a single property so the clients just send Accept-Language and don't have to know the language code fallback map or pick between the two title options.

I don't think that having a response that varies by Accept-Language is compatible with the current thinking around RESTBase storage. Honoring Accept-Language in this way would require storing per-variant responses, which we don't do and AFAIK don't plan to do, or somehow bypassing storage for non-default variants, which could end up putting quite a bit more load on the system.

Could the full list of varianttitles be stored in RESTBase but the actual output have the correct varianttitle or displaytitle selected based on the Accept-Language header?

Jdlrobson edited projects, added Web-Team-Backlog (Tracking); removed Web-Team-Backlog.
Jdlrobson subscribed.

I've put this on the reading web dev agenda to be talked about next Monday.

Could the full list of varianttitles be stored in RESTBase but the actual output have the correct varianttitle or displaytitle selected based on the Accept-Language header?

I'm imagining what you're describing as a kind of post-retrieval post-processing step, similar to the pre-processing step to hydrate merge nodes with RESTBase summaries in the feed response. Does that sound about right?

The drawback there is that it would amount to putting more application logic in RESTBase and making PCS increasingly tightly coupled with it, when in general we're trying to move in the opposite direction. But I'm not sure how the notion of individual services managing their own storage changes that picture. Actually, I'd like to better understand what that means for us in practical terms in general. Seems like all of this should go on the agenda for this Thursday's Audiences-CPT sync.

I'm imagining what you're describing as a kind of post-retrieval post-processing step, similar to the pre-processing step to hydrate merge nodes with RESTBase summaries in the feed response. Does that sound about right?

I'm hoping it could be identical to whatever will be done to return the correct language variant from mobile-html, but I don't know if this is realistic. The goal is that if a client makes a request with the same Accept-Language headers to /page/summary and /page/mobile-html it should get a formattedtitle property from /page/summary that maps 1:1 to what's in the <h1> on /page/mobile-html. (formattedtitle is just a working name to distinguish from displaytitle, it could be called whatever)

JoeWalsh updated the task description. (Show Details)

Updated the description to remove action=mobileview&prop=displaytitle (which will likely be deprecated) and add action=parse&prop=displaytitle, which also returns the correct result.

JoeWalsh updated the task description. (Show Details)

Page previews already sends Accept-Language, sending the value of mw.config.get( 'wgPageContentLanguage' ) so using Accept-Language makes sense to us in reading but please involve us and Chinese-speakers in testing prior to the deploy so we can check this works.

Change 526658 had a related patch set uploaded (by Joewalsh; owner: Joewalsh):
[mediawiki/services/mobileapps@master] Return displaytitle in the correct language variant from /page/summary

https://gerrit.wikimedia.org/r/526658

Picking this up as the issue is made worse in mobile-html since the summary response is used for the title:

Simulator Screen Shot - iPhone X - 2019-07-31 at 08.33.13.png (2×1 px, 274 KB)