Currently there is no way to specify language variant via the REST API, and without knowing a specific language variant, Parsoid cannot produce exact same content as Wikipedia page
e.g.
https://zh.wikipedia.org/api/rest_v1/page/html/%E4%B8%AD%E5%9C%8B
it contains raw wikitext markup "-{ }-"
```
-{H|zh:繁体字;zh-cn:繁体字;zh-tw:正體字;zh-hk:繁體字;zh-mo:繁體字;zh-sg:繁体字;}-
```
what we want is: allow client to specify language variant explicitly, and Parsoid return the exact same content as it is on Wikipedia, e.g.
https://zh.wikipedia.org/api/rest_v1/page/html/%E4%B8%AD%E5%9C%8B?lang=zh-cn
will return content same as:
https://zh.wikipedia.org/zh-cn/%E4%B8%AD%E5%9C%8B
## Requirements
- Continue to expose the original content, without variant conversions applied (as is the case right now).
- Additionally, offer content with variant conversions applied for read-only use cases.
- Follow the general REST API philosophy:
- Play well with caching.
- Predictable and simple request construction.
## Candidate solutions
%%%
### 1. Domains
The REST API is very much built around domains as the primary means of selecting project, storage & general configuration. As such, it would be fairly straightforward to assign separate domains to variants. Examples:
- `zh.wikipedia.org/api/rest_v1/..`: Un-translated content. Used for editing.
- `zh-cn.wikipedia.org/api/rest_v1/..`: Simplified Chinese. Read-only.
- `zh-tw.wikipedia.org/api/rest_v1/..`: Traditional Chinese. Read-only.
#### Considerations
- Wildcard certs are tied to a single sub-domain level, so introducing a second level for variants (ex: `cn.zh.wikipedia.org`) would not be easy.
#### Advantages
- Simple to implement in REST API, does not require Varnish changes
#### Disadvantages
- Requires new domains.
- Does not support listings of variants.
%%%
### 2. Path prefixes
Instead of using domains, use special path prefixes to select variants. The REST API currently uses `/api/rest_v1/`, which makes fitting variants into this scheme a bit awkward. T114662 proposes a scheme like `/wiki-cn/`, which could be adapted to `/api-cn/`.
The Chinese Wikipedia currently replaces `/wiki/` with the variant, as in `zh.wikipedia.org/zh-cn/Sometitle`. Fitting the API into this scheme without conflicts is tricky. The best I can think of is `zh.wikipedia.org/api/zh-cn/Sometitle`.
Alternatively, a schema like `https://{domain}{/variant}/api/rest_v1/` can also be used. Note the optional `{variant}` part. If it is missing, no variant is used.
#### Advantages
- Closer to current usage on Chinese Wikipedia.
#### Disadvantages
- Does not really support listings of variants either.
- Overloads root path namespace, opening the door to conflicts or less-than-obvious variant path names.
### 3. Accept-language header
Use the standard [accept-language header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language) to select content languages. To avoid cache fragmentation, normalize the `language-accept` header in Varnish, so that only meaningful values are considered & varied on.
#### Advantages
- Established standard (when using accept-language).
- Usually, automatically does the right thing for reading (more common than editing).
- For end user links, avoids sharing / construction of broken URLs (see several comments in T114662).
- Avoids fragmenting the API documentation by language, but requires more documentation for API subsetting. Swagger can support the accept-language header with value dropdowns (as with accept).
- Relatively easy to support across end points. Does not require URL layout changes.
#### Disadvantages
- Can be harder to debug / less obvious.
- Needs to be unset to be sure that content is editable. However, this is easy to do in XHR / fetch (CORS whitelisted).
- Requires more documentation on supported languages in individual API end points.
## Proposal
`Accept-Language` headers and paths are not mutually exclusive. Even when using path based selection primarily, we will want to set up redirects using `Accept-Language`. This suggests the following pragmatic approach for the REST API:
- Start by supporting `Accept-Language` headers in the REST API.
- Normalize `Accept-Language` headers in Varnish, and `vary` on it.
- Document and support `Accept-Language` header use in REST API.
- Consider adding explicit URLs at a later point, once / if we have established a uniform language selection URL scheme (see T114662). For caching purposes, URL requests can be rewritten to `Accept-Language` requests, or vice versa.
## See also
- {T114662}
- {T114640}
- Current Parsoid status
# {T43716}
# [Language conversion blocks](https://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec/Language_conversion_blocks) on mediawiki.org
- [RFC 5646: Tags for Identifying Languages](https://tools.ietf.org/html/rfc5646) and https://en.wikipedia.org/wiki/IETF_language_tag, defining hierarchical language tags like `en-gb` or `zh-hans`.