Status: Accepted per ArchCom meeting 20170-03-01, based on the discussion on this task.
Currently there is no way to specify language variant via the REST API, and without knowing a specific language variant, Parsoid cannot produce exact same content as Wikipedia page
e.g.
https://zh.wikipedia.org/api/rest_v1/page/html/%E4%B8%AD%E5%9C%8B
it contains raw wikitext markup "-{ }-"
-{H|zh:繁体字;zh-cn:繁体字;zh-tw:正體字;zh-hk:繁體字;zh-mo:繁體字;zh-sg:繁体字;}-
what we want is: allow client to specify language variant explicitly, and Parsoid return the exact same content as it is on Wikipedia, e.g.
https://zh.wikipedia.org/api/rest_v1/page/html/%E4%B8%AD%E5%9C%8B?lang=zh-cn
will return content same as:
https://zh.wikipedia.org/zh-cn/%E4%B8%AD%E5%9C%8B
Requirements
- Continue to expose the original content, without variant conversions applied (as is the case right now).
- Additionally, offer content with variant conversions applied for read-only use cases.
- Follow the general REST API philosophy:
- Play well with caching.
- Predictable and simple request construction.
Candidate solutions
### 1. Domains
The REST API is very much built around domains as the primary means of selecting project, storage & general configuration. As such, it would be fairly straightforward to assign separate domains to variants. Examples:
- `zh.wikipedia.org/api/rest_v1/..`: Un-translated content. Used for editing.
- `zh-cn.wikipedia.org/api/rest_v1/..`: Simplified Chinese. Read-only.
- `zh-tw.wikipedia.org/api/rest_v1/..`: Traditional Chinese. Read-only.
#### Considerations
- Wildcard certs are tied to a single sub-domain level, so introducing a second level for variants (ex: `cn.zh.wikipedia.org`) would not be easy.
#### Advantages
- Simple to implement in REST API, does not require Varnish changes
#### Disadvantages
- Requires new domains.
- Does not support listings of variants.
2. Path prefixes
Instead of using domains, use special path prefixes to select variants. The REST API currently uses /api/rest_v1/, which makes fitting variants into this scheme a bit awkward. T114662 proposes a scheme like /wiki-cn/, which could be adapted to /api-cn/.
The Chinese Wikipedia currently replaces /wiki/ with the variant, as in zh.wikipedia.org/zh-cn/Sometitle. Fitting the API into this scheme without conflicts is tricky. The best I can think of is zh.wikipedia.org/api/zh-cn/Sometitle.
Alternatively, a schema like https://{domain}{/variant}/api/rest_v1/ can also be used. Note the optional {variant} part. If it is missing, no variant is used.
Advantages
- Closer to current usage on Chinese Wikipedia.
Disadvantages
- Does not really support listings of variants either.
- Overloads root path namespace, opening the door to conflicts or less-than-obvious variant path names.
3. Accept-language header
Use the standard accept-language header to select content languages. To avoid cache fragmentation, normalize the language-accept header in Varnish, so that only meaningful values are considered & varied on.
Advantages
- Established standard (when using accept-language).
- Usually, automatically does the right thing for reading (more common than editing).
- For end user links, avoids sharing / construction of broken URLs (see several comments in T114662).
- Avoids fragmenting the API documentation by language, but requires more documentation for API subsetting. Swagger can support the accept-language header with value dropdowns (as with accept).
- Relatively easy to support across end points. Does not require URL layout changes.
Disadvantages
- Can be harder to debug / less obvious.
- Needs to be unset to be sure that content is editable. However, this is easy to do in XHR / fetch (CORS whitelisted).
- Requires more documentation on supported languages in individual API end points.
Proposal
Accept-Language headers and paths are not mutually exclusive. Even when using path based selection primarily, we will want to set up redirects using Accept-Language. This suggests the following pragmatic approach for the REST API:
- Start by supporting Accept-Language headers in the REST API.
- Normalize Accept-Language headers in Varnish, and vary on it.
- Document and support Accept-Language header use in REST API.
- Consider adding explicit URLs at a later point, once / if we have established a uniform language selection URL scheme (see T114662). For caching purposes, URL requests can be rewritten to Accept-Language requests, or vice versa.
See also
- T114662: RFC: Per-language URLs for multilingual wiki pages
- T114640: make Parser::getTargetLanguage aware of multilingual wikis
- Current Parsoid status
- RFC 5646: Tags for Identifying Languages and https://en.wikipedia.org/wiki/IETF_language_tag, defining hierarchical language tags like en-gb or zh-hans.