Return variant URLs and titles in a dictionary of the summary response
Open, Needs TriagePublic


Dealing with variant URLs is a complicated issue. The rules for variants are different depending on which API or content you are dealing with. Many clients have implemented logic to deal with this, and this code basically has to be rewritten on every client, usually with a different set of bugs.

Most of this could be solved by returning all the URLs needed by clients so they never have to figure out how to mangle the titles.

This ticket has two major components:

  1. Return an exhaustive dictionary of URLs and titles for all variants of the page
  2. Ensure the primary titles and URLs in the summary response are actually correct and take into account the idiosyncrasies of manipulating variant URLs for each API

The structure will be something like the top level dictionaries, but scoped to each variant:

variants: {
  zh: {
    titles: {
      "title": "File:Collage_of_Nine_Dogs.jpg", // Use this when constructing a URL
      "normalized_title": "Collage of Nine Dogs.jpg", // Use this for plain text
      "display_title": "<strong>Collage of Nine Dogs.jpg</strong>", // Use this for WebViews
      "namespace_title": "File", // Use this when you want to include the namespace in the rich or plain text title
    content_urls: {
      page: "…",
      revisions: "..."
      editor: "…",
      talk_page: "...",
    api_urls: {
      summary: "...", //The URL to this response
      mobile_sections: "...",  //return for now, but will be deprecated
      read_html: "...", //new PCS endpoint 
      content_html: "...", //new PCS end point
      metadata: "…", //new PCS end point
      references: "…",  //new PCS end point
      gallery: "…",  //new PCS end point
      revisions: "...",
      edit_html: "..." //Parsoid HTML url
      talk_page_html: "...", //Parsoid HTML url for the talk page
  zh-cn: {
    titles: {
    content_urls: {
    api_urls: {

See the parent tickets for the proposed content of each of the dictionaries.

Some open questions:

  1. The implementing engineers will need to research this a bit further to see if this is a good structure
  2. Do we need other information like marking the "canonical" variant?
Fjalapeno created this task.Oct 6 2017, 4:02 PM

These would be links for getting page content, right?

Here are my thoughts based on my recent work in this area:

For any given page in a language with variants, there seems to be one canonical DB title, and its variant is arbitrary: it's just whatever variant the page creator used. (We can get the canonical title with language->findVariantLink( $title ) if it's not safe to presume we already have the canonical title because MediaWiki's already given it to us.)

So taking the Barack Obama article as an example, assuming content_urls is meant to provide links to the mobile website, we'd see something like this (with unencoded characters for explanatory purposes—note that the title segment of the path is identical (and using simplified characters) in call cases):

"content_urls": {
  "zh": "贝拉克·奥巴马"
  "zh-cn": "贝拉克·奥巴马"
  "zh-hk": "贝拉克·奥巴马"

This should be pretty trivial to construct once we have the set of variants via language->getVariants().

For titles, assuming this is to provide display titles in the correct variant (maybe it should be display_titles?), we'd return something like:

"titles": {
  "zh": "贝拉克·奥巴马"
  "zh-cn": "贝拉克·奥巴马"
  "zh-hk": "巴拉克·奧巴馬"

I think we could get these with language->autoConvertToAllVariants( $title ).

I'm not sure what's envisioned for api_urls but my point is that there seem to be pretty good affordances for this kind of thing on the PHP side. I'm not sure how well exposed this stuff is through the MediaWiki API, though, so this might involve some work updating the API on that side for us to consume from a node.js service.

On a side node, it makes more sense to me to have the language variant codes be the first level under variants and then have each variant code have a standard display_title, content_url, and whatever else.

@Mholloway for examples of what is in those dictionaries, check out the parent tickets

For instance, the main "titles" dictionary in the summary has various titles like normalized, denormalized, display, etc…

Let me know if the parent tickets provide enough info.

Fjalapeno updated the task description. (Show Details)Oct 6 2017, 5:29 PM
Fjalapeno updated the task description. (Show Details)Oct 6 2017, 5:32 PM

@Mholloway actually I just pasted in the examples from the other tickets…

So it is possible that all these URLs are not valid for the variants… that is probably the key information needed here.

Mholloway added a comment.EditedOct 6 2017, 6:34 PM

OK, thanks for the updates, I see what's going on now.

Basically my understanding boils down to this: each page has a canonical title that's the one we want to use whenever we interact with MediaWiki (e.g., to specify in URLs). This might not be in the default variant for the language (if the language has variants); it could be in any variant.

With that in mind, I'd expect that many of not most of these URLs will be the same across variants (I think this would be the case for everything under content_urls except page, for example), but I agree with the idea of providing them on a per-variant basis for the ones that do vary. How we would want to structure this probably depends on how much variation there is across variants. :)

Also, AFAIK, how to specify a variant to RESTBase (e.g., as a URL path segment or in an Accept-Language header) is still TBD (T122942, T159985), so I'm not sure whether there will be any variation across language variants for PCS or other REST API URLs.

Edit: OK, after reviewing T122942 it looks like it's going to be Accept-Language headers at least to start out. So there should be no variation by language variant for REST API URLs, either.

@Fjalapeno It's not clear to me why we would want and endpoint which responds with an exhaustive dictionary of URLs and titles for all variants of a page. What's the use case from a client perspective? By making this list of variants the client has choose one, and would need to know what variants are available and desired.
Wouldn't it be easier to have the client simply set the AcceptLanguage header, as proposed and agreed in T122942?

In other words, I agree with @Mholloway's edit of his previous comment.

Change 384890 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Summary 2.0: language variant content URLs

Change 384738 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Add common URLs to summary API

Change 384890 abandoned by Mholloway:
Summary 2.0: language variant content URLs

merged with the preceding patch

While the updated variant property is much simpler and reduced now, I still would argue that a variant property should not be necessary. My main concern is that variant info stays the same for all pages of a particular wiki site.

Instead clients should call siteinfo whenever they encounter a new wiki site. That site info call should have all necessary variant information if present, and can be cached on the client for quite a bit.

phuedx added a comment.Nov 2 2017, 2:35 PM

@bearND: That sounds like a higher-level protocol for interacting with a wiki via the API. This isn't a criticism, just an observation. The protocol will need to be well-documented for client implementors (and they'll have to look for it) rather than simply being given the information.

As you say, the variant info will very likely be static for the lifetime of a wiki and therefore could be cached by a client (and at our edge). From a performance perspective, this is ideal as we minimise information transferred per request but it comes at the potential cost of the API being a little harder to navigate.