Page MenuHomePhabricator

Need a way to fetch all variant title options for mapping to langlinks response
Open, Needs TriagePublic

Description

In the apps, from the alternative languages picker on the article screen we fetch the langlinks endpoint with a call that looks like this:

https://en.wikipedia.org/w/api.php?format=json&formatversion=2&errorformat=html&action=query&prop=langlinks&lllimit=500&redirects=&converttitles=&titles=Super_Mario_Bros.

To show language variants, for iOS we expand the results client-side and intersperse variant choices. Because we aren't given variant titles from the endpoint, the single title we receive for Zh Wikipedia is used in all it's variant choice cells, but in many cases this title should vary.

{
    "lang": "zh",
    "title": "超级马力欧兄弟"
}

IMG_1640.jpg (1×1 px, 198 KB)

This is incorrect as the article title for "Super Mario Bros." for Chinese Traditional (Hong Kong) should be "超級瑪利歐兄弟", as seen on Desktop:

https://zh.wikipedia.org/zh-hk/%E8%B6%85%E7%BA%A7%E9%A9%AC%E5%8A%9B%E6%AC%A7%E5%85%84%E5%BC%9F

I have not been able to find a way to fetch all of these variant titles from the langlinks endpoint or a similar endpoint. We could ask the langlinks endpoint to return a varianttitles object, however that only seems to apply to the Wiki subdomain we are requesting on.

https://zh.wikipedia.org/w/api.php?action=query&format=json&llinlanguagecode=en&lllimit=500&llprop=langname%7Cautonym&prop=langlinks%7Cinfo&inprop=varianttitles&redirects=&titles=%E8%B6%85%E7%BA%A7%E9%A9%AC%E5%8A%9B%E6%AC%A7%E5%85%84%E5%BC%9F

"varianttitles": {
    "zh": "\u8d85\u7ea7\u9a6c\u529b\u6b27\u5144\u5f1f",
    "zh-hans": "\u8d85\u7ea7\u9a6c\u529b\u6b27\u5144\u5f1f",
    "zh-hant": "\u8d85\u7d1a\u99ac\u529b\u6b50\u5144\u5f1f",
    "zh-cn": "\u8d85\u7ea7\u9a6c\u529b\u6b27\u5144\u5f1f",
    "zh-hk": "\u8d85\u7d1a\u99ac\u529b\u6b50\u5144\u5f1f",
    "zh-mo": "\u8d85\u7d1a\u99ac\u529b\u6b50\u5144\u5f1f",
    "zh-my": "\u8d85\u7ea7\u9a6c\u529b\u6b27\u5144\u5f1f",
    "zh-sg": "\u8d85\u7ea7\u9a6c\u529b\u6b27\u5144\u5f1f",
    "zh-tw": "\u8d85\u7d1a\u99ac\u529b\u6b50\u5144\u5f1f"
}

The same call on EN only gives us the data for "en":

"varianttitles": {
    "en": "Super Mario Bros."
}

Really we need these variant titles within the langlinks objects themselves, so we can have all alternative article titles and their variant title spellings in one call. As it stands for each language within langlinks that supports variants, we would have to make the call again on that corresponding wiki, with the varianttitles flag set to get the alternative variant characters. This could become heavyweight very quickly.

If there's a single other call already out there with this data that I missed somewhere, that will be fine, but anything that requires us to make extra calls per language will not be sustainable.

Event Timeline

I think it would be good to have an input from Language-Team (Language-2024-April-June) before starting to investigate the solution for a new feature. cc/ @Nikerabbit and @Arrbee

One concern I have is that language conversion relies on data that is partly wiki-specific, see e.g. https://zh.wikipedia.org/wiki/Special:%E5%89%8D%E7%BC%80%E7%B4%A2%E5%BC%95?prefix=Conversiontable&namespace=8. As far as I know, MediaWiki architecture is not yet in a state where it could reliably do such cross-wiki processing in-process. Platform team would be the place to ask more details about this.

A simple (but probably not a good) solution could be adding an API/service that internally does all these per-wiki calls.

A more complex solution could be altering the user interface. See for example T278372: Support for language variants and multiple scripts what @Pginer-WMF has been thinking about this. For example, maybe variants would only be listed once you go in that language, or maybe variants could be listed without article name. I think it would be good to check with Pau and other teams to ensure that the language selection experience is not too much different between the different platforms.

A third option, is to store the converted variant names somewhere. That has it's own set of problems, like cache invalidation, so it is not a simple option either. The middleware service I suggested above could also do such caching to speed up future calls.

, but anything that requires us to make extra calls per language will not be sustainable

There are currently only ~10 languages with converters, and I am not expecting to see many new ones, so the impact is somewhat limited, but I do agree this info should be available behind one call for application developers.

There has been some ideas to split out language converter as a separate service/library (e.g. T213345#4939477), but I fail to see how it would help with this use case other than removing the dependency on per-wiki configuration.