Page MenuHomePhabricator

MCS should respect Accept-Language header for MW API requests
Closed, ResolvedPublic

Description

In addition to the requests to Parsoid passing the Accept-Language header (T195948), MCS should also request the correct language variant for the values we get from MW API. A few areas that come to mind are:

  • summary: displaytitle, description, extract
  • metadata: categories, hatnotes, issues, toc,
  • media: titles, descriptions, captions, artists, license strings, ...
  • mobile-sections: for the main page content we still use action=mobileview

Event Timeline

bearND triaged this task as Medium priority.Jun 12 2018, 3:11 PM
bearND created this task.

Change 439997 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[mediawiki/services/mobileapps@master] [WIP] Respect Accept-Language header for MW API requests

https://gerrit.wikimedia.org/r/439997

Vvjjkkii renamed this task from MCS should respect Accept-Language header for MW API requests to u6aaaaaaaa.Jul 1 2018, 1:05 AM
Vvjjkkii removed Jdforrester-WMF as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
mobrovac renamed this task from u6aaaaaaaa to MCS should respect Accept-Language header for MW API requests.Jul 1 2018, 10:15 AM
mobrovac assigned this task to Jdforrester-WMF.
mobrovac lowered the priority of this task from High to Medium.
mobrovac updated the task description. (Show Details)
mobrovac removed a subscriber: Jdforrester-WMF.

What's the status on this? Should we move it to blocked? If so, what is it blocked on and what will we do to unblock it? Maybe move it back to TODO? Thanks!

What's the status on this? Should we move it to blocked? If so, what is it blocked on and what will we do to unblock it? Maybe move it back to TODO? Thanks!

Per gerrit, "Waiting for I807dd55d49e and then decoding from BCP47 to MW codes and back would be good." Though that commit is tagged against three tasks (T34483, T106367, and T120847) none of them are really what's blocking us, so much as "Have MediaWiki translate Accept-Language headers into MediaWiki speak and back again" – we could make that as a subsidiary task to this and mark it as blocked? Not sure what's the best way forward.

Isn't the burden of sending the correct mediawiki compatible language headers on the clients right now? Web can use mw base library and apps have some logic for transforming the codes I believe.

If the headers are passed to mediawiki as they come, isn't that all we need right now?

If having the language conversion at the mediawiki layer is something important we want to do then let's create a separate task that doesn't block this one, as it is a different feature (sounds good to me BTW).

Thoughts? cc/ @bearND @Mholloway

Moving to blocked until we've discussed.

BTW, does parsoid use MW language codes or standard ones? Which type are we telling clients (web & apps) to send? I'm confused right now.

BTW, does parsoid use MW language codes or standard ones? Which type are we telling clients (web & apps) to send? I'm confused right now.

Parsoid uses MW language codes, so clients should send MW-specific codes. If MW ever starts speaking standard ones, it will be trivial for Apps and Web to start sending those instead.

Moving to code review then.

api.php calls should be given the header from clients, that's the concern of this task then.

@Jdforrester-WMF Do you intend to work on this anytime soon? Should one of us pick it up and take it across the finish line?

@Jdforrester-WMF Do you intend to work on this anytime soon? Should one of us pick it up and take it across the finish line?

Hey! Sorry, totally dropped this ball. This should be do-able now; MediaWiki theoretically is able to speak BCP47 codes and respond appropriately for requests. I've not done formal testing in production since the code shipped though. It'd be great it someone could get it over the finish line, yes.

Focusing on the summary endpoint, it looks like neither the uselang parameter nor the Accept-Language header affect the title or description returned by action=query on srwiki. Testing with https://sr.wikipedia.org/w/api.php?action=query&titles=%D0%A1%D1%80%D0%B1%D0%B8%D1%98%D0%B0&format=json&prop=coordinates|description|pageprops|pageimages|revisions|info|langlinks|categories&lllimit=max&piprop=thumbnail|name|original&pilicense=any&pithumbsize=640&inprop=protection&rvprop=ids|timestamp|user|contentmodel&rvslots=main&clprop=hidden&cllimit=50&uselang=sr-Latn and seeing Cyrillic instead of Latin characters for all fields. Assuming the description source is "central", maybe a separate call to wikidata would be required to get the correct description? Is there any way to get the title in the desired language variant?

It looks like action=mobileview on srwiki respects both the uselang parameter and the Accept-Language header to provide the correct displaytitle, but the description is still sr-Cyrl: https://sr.wikipedia.org/w/api.php?action=mobileview&page=Србија&format=json&redirect=yes&prop=normalizedtitle|description|displaytitle&uselang=sr-Latn

For reference, here's the wikidata entity: https://wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q403 The desired description for the BCP 47 code sr-Latn is under the MW code sr-el

@JoeWalsh Looks like PS10 that you uploaded on the attached Gerrit patch is a rebase... let me know if I'm mistaken.

I think the patch itself is good to merge; do you agree? To the extent MediaWiki isn't honoring the variant info it's provided, that's to be fixed on the MediaWiki side, I think.

@Mholloway yep, it's a rebase. The patch is good to merge, but it might be good to wait and discuss with @Charlotte and @JMinor about requirements and use cases.

For example, on iOS, a user might have English as their primary language and sr-Latn further down the list. In this case, the solution as implemented would not pass on that preference to srwiki, whereas forwarding the full header (for example Accept-Language: en, zh-Hans;q=0.8, sr-Latn;q=0.6, sr-Cyrl;q=0.4, zh-Hant;q=0.2) would.

One question is whether or not the wikis with language variants all support the Accept-Language header - I tested zhwiki and srwiki and they both do. What other wikis have language variants?

The other question is whether or not we think the use case of having variants further down the preference list is common enough to matter.

@JoeWalsh - We've had at least one ticket - T197580 - asking us to support variants in a similar manner. @cooltey can probably speak to how important this is for Chinese. The upshot is I would prefer that we forward the full header, if possible, which seems like it will provide the best user experience for folks who have multiple languages set.

As for other languages with variants, try Norwegian. It may have variants for bokmål and nynorsk.

It looks like the Norwegian variants are two separate wikis, no for Bokmål and nn for Nynorsk, so that should be covered by picking the right domain for the request on the client.

@JoeWalsh I don't know of any convenient list of wikis with language variants. Technically speaking, it's not a property of a wiki and not set in configuration, but defined in Language subclasses like LanguageZh which MediaWiki then handles appropriately when they represent the configured wiki language.

The easiest way I know to find which languages have variants is to examine these Language subclasses, which are defined in files in mediawiki/languages/classes.

FWIW, just passing through the Accept-Language header in all cases (regardless of API) still seems cleaner to me, besides providing for fuller language fallback preferences, as you note.

Sounds good, it seems like we should move forward with forwarding the Accept-Language header in all cases. From there, we can see what we need to workaround or try to get changed in MediaWiki.

@JoeWalsh Another easy way to get the language variants in this siteinfo MW API request. It also shows the fallbacks, which should be followed IMO.

Change 512402 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/mobileapps@master] Pass through accept-language header for MW API requests

https://gerrit.wikimedia.org/r/512402

After some further investigation, here's how the situation looks at present:

  • The MediaWiki action API will happily accept language codes in either native MediaWiki or BCP47 format, via either the uselang param or an Accept-Language header, and apply them to the best of its ability.
  • The REST API does not yet support Accept-Language headers in either format.
  • Language conversion applies only to content that passes through the parser. This includes page content but does not apply to incidental metadata stored separately, e.g., title descriptions. Those we'll have to somehow handle separately.

(Then how is a language-converted displaytitle returned by https://sr.wikipedia.org/w/api.php?action=mobileview&page=Србија&format=json&redirect=yes&prop=normalizedtitle|description|displaytitle&uselang=sr-Latn, you ask? The answer is that the mobileview API cheats: it uses a getDisplayTitle() method provided by ParserOutput.php, which returns a displayable title, but does not actually return the displaytitle pageprop; in fact, https://sr.wikipedia.org/wiki/Србија has no displaytitle pageprop: https://sr.wikipedia.org/w/api.php?action=query&prop=pageprops&ppprop=displaytitle&titles=Србија . FWIW, MCS's mobile-sections API also cheats and just reuses the original title in the displaytitle field if no real displaytitle pageprop is found.)

What other wikis have language variants?

Found another handy-dandy list in the LanguageConverter class: https://phabricator.wikimedia.org/source/mediawiki/browse/master/languages/LanguageConverter.php$40

@Mholloway Now THAT is a handy class to know about. Thanks for finding/posting it!

cc @ABorbaWMF - You may want to have this list of languages with variants handy for your testing.

Hmm. For RESTBase-stored endpoints (I think, currently, that includes /page/summary and /page/mobile-sections from those in the description), won't this result in RB-stored content containing arbitrary variants (i.e., for any given document, the variant requested by the user who requested the copy that was then stored)? I think we need to provide for variant-specific storage before we can really do this.

Hmm. For RESTBase-stored endpoints (I think, currently, that includes /page/summary and /page/mobile-sections from those in the description), won't this result in RB-stored content containing arbitrary variants (i.e., for any given document, the variant requested by the user who requested the copy that was then stored)? I think we need to provide for variant-specific storage before we can really do this.

RB doesn't store language variants, it only stores the canonical render. For language variants in the case of Parsoid, we ask it to translate it into the needed language variant. For MCS-powered end points, we can either:

  • ask Parsoid to translate the HTML and then send that to MCS; or
  • use the mobile-friendly canonical version and then send it to MCS for translation

The former is easier, but the latter reduces client latency.

Change 512402 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Pass through accept-language header for MW API requests

https://gerrit.wikimedia.org/r/512402

Change 516680 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/wikifeeds@master] Pass through accept-language headers to MW API

https://gerrit.wikimedia.org/r/516680

Change 516680 merged by jenkins-bot:
[mediawiki/services/wikifeeds@master] Pass through accept-language headers to MW API

https://gerrit.wikimedia.org/r/516680

Change 518067 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/mobileapps/deploy@master] Update prod config template to pass thru accept-language to the MW API

https://gerrit.wikimedia.org/r/518067

Change 518067 merged by jenkins-bot:
[mediawiki/services/mobileapps/deploy@master] Update prod config template to pass thru accept-language to the MW API

https://gerrit.wikimedia.org/r/518067

Change 439997 abandoned by Jforrester:
Respect Accept-Language header for MW API requests

Reason:
Replaced by I330e5b2dd17a.

https://gerrit.wikimedia.org/r/439997