Page MenuHomePhabricator

Improve Accept-Language header handling in RESTBase
Open, HighPublic

Description

Story
"As a user of the Page Content Service, I want to send a standard Accept-Language header with my request and receive the appropriate language variant in the response."

Designs/Interface/Mockups

On a request to sr.wikipedia.org, Accept-Language: en, sr-Latn;q=0.9, sr-Cyrl;q=0.8 would be reduced to sr-el.

On a request to zh.wikipedia.org, Accept-Language: en, zh-tw;q=0.5 would be reduced to zh-tw.

There is code in the mobileapps service that handles this reduction. It sorts the codes by q score and the converts from IETF language tags to Wikipedia language codes to pick the most relevant language code. It should just be a matter of moving that code upstream into RESTBase. The code only handles zhwiki and srwiki at the moment but could be expanded to other wikis with variants.

Done Criteria

  • Requests that pass a standard accept language header utilize the highest-ranked language code and variant that's relevant to the domain.

Specific Examples:

  • curl "https://sr.wikipedia.org/api/rest_v1/page/summary/%D0%A1%D1%80%D0%B1%D0%B8%D1%98%D0%B0" -H "Accept-Language: en, sr-Latn;q=0.9, sr-Cyrl;q=0.8" picks sr-el as the appropriate wiki language variant. The returned displaytitle should be Srbija and not Србија.
  • curl "https://zh.wikipedia.org/api/rest_v1/page/summary/%E4%B8%AD%E5%9C%8B" -H "Accept-Language: en, zh-hans;q=0.5" picks zh-hans as the appropriate wiki language variant. The returned displaytitle should be 中国 and not 中國.
  • The above examples return the correct variant for other Page Content Service endpoints - /page/mobile-html, /page/media-list

Event Timeline

@JoeWalsh: Assuming this task is about the RESTBase code project, hence adding that project tag so other people who don't know or don't care about team tags can also find this task when searching via projects. Please set appropriate project tags when possible. Thanks!

Hi @AMooney, what's the status of this ticket now? I'd like to see it got fixed since the iOS app is still showing mixed language variants content in the app. (it is using the mobile-html)

The code from MCS should be made into an npm package used by RESTBase.

Actually, we need to search for existing packages and evaluate their quality before porting our custom code. https://www.npmjs.com/package/accept-language-parser seems like it should do the job, but needs to be verified.

This comment was removed by daniel.
This comment was removed by daniel.

There are 2 components to this task:

  1. Get a list of supported languages based on the URL requested which is what the mobile code does for sr and zh. Ideally, that list should be generated from the IANA Language Subtag Registry. We could start with a list of all language codes for which Wikis are available but would (probably?) need to remove unsupported language variants from the IANA list. This is obviously WMF-specific and we need to write our own code.
  2. The list from 1. and the Accept-Language header tags would be used to pick the right language to serve. For this we can find a standard-compliant package that provides the functionality. There is the Accept-Language-Negotiator package that looks promising.

I can work on that next. As an aside, I don't think this should have been tagged clinic duty. It's more a small project type.

Get a list of supported languages based on the URL requested which is what the mobile code does for sr and zh. Ideally, that list should be generated from the IANA Language Subtag Registry. We could start with a list of all language codes for which Wikis are available but would (probably?) need to remove unsupported language variants from the IANA list. This is obviously WMF-specific and we need to write our own code.

I do not entirely understand this statement. There's 2 methods in RESTBase: mwUtils.canConvertLangVariant and mwUtils.shouldConvertLangVariant, which you can look at. They internally get the list of supported variants from the wiki. So, the key is just to parse out all languages and variants from the Accept-Language, sort them in order, and then choose the language/variant with the highest priority and supported by the site.

Resetting deactivated assignee account.