Page MenuHomePhabricator

Add API module to get language information
Closed, ResolvedPublic


Suggested by @Anomie in T217239#4994301: a module like meta=languageinfo which gives you access to information (e. g. name in user language, autonym, fallbacks) for arbitrary languages MediaWiki knows about. Because this information is expensive to retrieve when the localization cache is not used, the module would apply continuation after the request reaches a certain time limit (e. g. two seconds), but in production we wouldn’t expect this to be necessary.

I think this would be a good task for the Wikimedia Hackathon 2019.

Event Timeline

I’m not sure what the parameters should be. It could be a list of language codes, defaulting to *; it could be a pair of from and to parameters, or from and limit; it could be both (e. g. meta=allmessages has messages, from and to). I feel like a from version would work better with continuation, though.

If we have an "all" value, it should be *. I'm not sure whether we really need one though versus just letting the client specify the languages it actually cares about. Do you have a use case?

I don’t, @Pintoch and @Mvolz do :) and at least @Pintoch’s use case sounded like he cares about all languages.

Let me spell out my use case in more detail then.

When writing external services around Wikidata, you often get to cache information from Wikidata in your own database. Labels are typically one of these things if you want to avoid querying Wikidata every time you mention an item in your UI. If your application is localized, then you probably want to cache all available labels for an item and display the correct one according to the user's locale. But Wikidata editors curate labels with the language fallback in mind: for instance, when reading Wikidata in Spanish, it is not a big deal if a Wikidata item about an English pub does not have a Spanish label: the language fallback will display the English one automatically, and that is the one that makes sense anyway (the pub probably does not have an established Spanish name). Therefore, when reusing Wikidata labels in an external application, it makes sense to use the same fallback strategy, for consistency with Wikidata. For that to work, it would be useful to have access to a structured representation of the fallback graph. This would be retrieved from the API, which would typically be queried very rarely, possibly even hard-coded in the application or some Wikidata library whose maintainers would take care of the update. Therefore, yes I would use the API to retrieve the full graph in one go (split in a few queries if that is hard to handle for MediWiki of course).

I have also detailed in a concrete example of a web service where I am currently using a crude approximation of the language fallback (everything falling back to English) because I do not have access to this data at the moment. If anything is unclear about that example, I am happy to answer any questions about it.

I agree with the assessment that it would be a good hackathon project - in any case, this is of course not blocking at all on my side, this should have low priority given that it will only serve few people.

There will be continuation required, almost certainly. But that seems unlikely to be a big deal.

I'm slightly torn between agreeing that a * makes sense and suggesting that you could use action=query&meta=siteinfo&siprop=languages to get the list of languages and then use the new module to fetch the details. Do you have thoughts on the convenience of a * versus the less complex implementation of the latter option, beyond that the first would be easier?

I'm also wondering if it might make sense to have a "withfallbacks" parameter, so if you said lilang=gag then it would return you the language info for Turkish and English too without you have to read the returned fallback chain for Gagauz and make a second request.

Change 510705 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/core@master] Add action=query&meta=languageinfo API module

Change 510705 merged by jenkins-bot:
[mediawiki/core@master] Add action=query&meta=languageinfo API module

Anomie claimed this task.

Thanks for merging!

Thanks for writing this, this is great!

And by the way, on Beta getting all language information seems to take betweem 200 and 400 ms, so it’s indeed nice and efficient :)

I’ll try to remember to document this on next week (once it’s deployed) in two weeks (SRE Summit next week).

After several more train mishaps, the module is finally deployed in production \o/ and getting all language information here takes somewhere between 300 and 1000 ms.