Return variant URLs and titles in the metadata response
Closed, ResolvedPublic

Description

Dealing with variant URLs is a complicated issue. The rules for variants are different depending on which API or content you are dealing with. Many clients have implemented logic to deal with this, and this code basically has to be rewritten on every client, usually with a different set of bugs.

Most of this could be solved by returning all the URLs needed by clients so they never have to figure out how to mangle the titles.

This ticket has two major components:

  1. Return an exhaustive dictionary of URLs and titles for all variants of the page
  2. Ensure the primary titles and URLs in the summary response are actually correct and take into account the idiosyncrasies of manipulating variant URLs for each API

The structure will be something like the page summary dictionaries, but scoped to each variant:

variants: {
  zh: {
    display_title: "<strong>Collage of Nine Dogs.jpg</strong>",
    content_urls: {
      page: "…",
      talk: "...",
    },
  },
  zh-cn: {
    display_title: "...",
    content_urls: {
        "…"
    }
  } 
}

See the parent tickets for the proposed content of each of the dictionaries.

Some open questions:

  1. The implementing engineers will need to research this a bit further to see if this is a good structure
  2. Do we need other information like marking the "canonical" variant?
Fjalapeno created this task.Oct 6 2017, 4:02 PM

These would be links for getting page content, right?

Here are my thoughts based on my recent work in this area:

For any given page in a language with variants, there seems to be one canonical DB title, and its variant is arbitrary: it's just whatever variant the page creator used. (We can get the canonical title with language->findVariantLink( $title ) if it's not safe to presume we already have the canonical title because MediaWiki's already given it to us.)

So taking the Barack Obama article as an example, assuming content_urls is meant to provide links to the mobile website, we'd see something like this (with unencoded characters for explanatory purposes—note that the title segment of the path is identical (and using simplified characters) in call cases):

"content_urls": {
  "zh": "https://zh.wikipedia.org/wiki/贝拉克·奥巴马"
  "zh-cn": "https://zh.wikipedia.org/zh-cn/贝拉克·奥巴马"
  "zh-hk": "https://zh.wikipedia.org/zh-hk/贝拉克·奥巴马"
  [...]
}

This should be pretty trivial to construct once we have the set of variants via language->getVariants().

For titles, assuming this is to provide display titles in the correct variant (maybe it should be display_titles?), we'd return something like:

"titles": {
  "zh": "贝拉克·奥巴马"
  "zh-cn": "贝拉克·奥巴马"
  "zh-hk": "巴拉克·奧巴馬"
  [...]
}

I think we could get these with language->autoConvertToAllVariants( $title ).

I'm not sure what's envisioned for api_urls but my point is that there seem to be pretty good affordances for this kind of thing on the PHP side. I'm not sure how well exposed this stuff is through the MediaWiki API, though, so this might involve some work updating the API on that side for us to consume from a node.js service.

On a side node, it makes more sense to me to have the language variant codes be the first level under variants and then have each variant code have a standard display_title, content_url, and whatever else.

@Mholloway for examples of what is in those dictionaries, check out the parent tickets

For instance, the main "titles" dictionary in the summary has various titles like normalized, denormalized, display, etc…

Let me know if the parent tickets provide enough info.

Fjalapeno updated the task description. (Show Details)Oct 6 2017, 5:29 PM
Fjalapeno updated the task description. (Show Details)Oct 6 2017, 5:32 PM

@Mholloway actually I just pasted in the examples from the other tickets…

So it is possible that all these URLs are not valid for the variants… that is probably the key information needed here.

Mholloway added a comment.EditedOct 6 2017, 6:34 PM

OK, thanks for the updates, I see what's going on now.

Basically my understanding boils down to this: each page has a canonical title that's the one we want to use whenever we interact with MediaWiki (e.g., to specify in URLs). This might not be in the default variant for the language (if the language has variants); it could be in any variant.

With that in mind, I'd expect that many of not most of these URLs will be the same across variants (I think this would be the case for everything under content_urls except page, for example), but I agree with the idea of providing them on a per-variant basis for the ones that do vary. How we would want to structure this probably depends on how much variation there is across variants. :)

Also, AFAIK, how to specify a variant to RESTBase (e.g., as a URL path segment or in an Accept-Language header) is still TBD (T122942, T159985), so I'm not sure whether there will be any variation across language variants for PCS or other REST API URLs.

Edit: OK, after reviewing T122942 it looks like it's going to be Accept-Language headers at least to start out. So there should be no variation by language variant for REST API URLs, either.

@Fjalapeno It's not clear to me why we would want and endpoint which responds with an exhaustive dictionary of URLs and titles for all variants of a page. What's the use case from a client perspective? By making this list of variants the client has choose one, and would need to know what variants are available and desired.
Wouldn't it be easier to have the client simply set the AcceptLanguage header, as proposed and agreed in T122942?

In other words, I agree with @Mholloway's edit of his previous comment.

Change 384890 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Summary 2.0: language variant content URLs

https://gerrit.wikimedia.org/r/384890

Change 384738 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Add common URLs to summary API

https://gerrit.wikimedia.org/r/384738

Change 384890 abandoned by Mholloway:
Summary 2.0: language variant content URLs

Reason:
merged with the preceding patch

https://gerrit.wikimedia.org/r/384890

While the updated variant property is much simpler and reduced now, I still would argue that a variant property should not be necessary. My main concern is that variant info stays the same for all pages of a particular wiki site.

Instead clients should call siteinfo whenever they encounter a new wiki site. That site info call should have all necessary variant information if present, and can be cached on the client for quite a bit.

phuedx added a comment.Nov 2 2017, 2:35 PM

@bearND: That sounds like a higher-level protocol for interacting with a wiki via the API. This isn't a criticism, just an observation. The protocol will need to be well-documented for client implementors (and they'll have to look for it) rather than simply being given the information.

As you say, the variant info will very likely be static for the lifetime of a wiki and therefore could be cached by a client (and at our edge). From a performance perspective, this is ideal as we minimise information transferred per request but it comes at the potential cost of the API being a little harder to navigate.

Change 384890 restored by Mholloway:
Summary 2.0: language variant content URLs

https://gerrit.wikimedia.org/r/384890

Mholloway renamed this task from Return variant URLs and titles in a dictionary of the summary response to Return variant URLs and titles in the metadata response.Apr 17 2018, 7:35 PM

Per discussion in weekly RI meeting. This info is more appropriately conveyed in the metadata endpoint response.

@Fjalapeno What would you say the priority is on this?

Change 384890 abandoned by Mholloway:
Summary 2.0: language variant content URLs

Reason:
per discussion on ticket

https://gerrit.wikimedia.org/r/384890

Change 440977 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Add language variant content URLs to the metadata endpoint

https://gerrit.wikimedia.org/r/440977

Mholloway updated the task description. (Show Details)Jun 18 2018, 11:27 PM
Mholloway updated the task description. (Show Details)Jun 18 2018, 11:29 PM
Mholloway reassigned this task from Mholloway to MSantos.Jun 18 2018, 11:48 PM
Mholloway added a subscriber: MSantos.

Reassigning to @MSantos to take on the remaining portion of this task: adding the per-variant display titles.

Change 441517 had a related patch set uploaded (by MSantos; owner: MSantos):
[mediawiki/services/mobileapps@master] Adding per-variant display_title property.

https://gerrit.wikimedia.org/r/441517

Mholloway triaged this task as Low priority.Jun 25 2018, 9:02 PM

@MSantos @Mholloway Just following up on comments from yesterday:

  1. Question: How is this being done before variants are supported in the RESTBase API? I assumed we were waiting on this until that work is complete.
  2. Normalizing responses was being talked about in several contexts, like Disambiguation pages in summaries. One fall out of this is using things like Summaries whenever possible. This seems like a good candidate for this idea… meaning that each sub-dictionary for each variant should probably be a summary is possible. Is there anything preventing this?
Mholloway added a subscriber: Jdforrester-WMF.EditedJun 26 2018, 2:13 PM

@Fjalapeno Just following up on this from our weekly meeting yesterday. The patches in review only add a display title and variant-specific MediaWiki page URLs to the metadata response. REST API URLs are omitted since language variants will be specified via Accept-Language header:

"variants": {
  "en-foo": {
    "display_title": "Title",
    "content_urls": {
      "page": "https://en.wikipedia.org/wiki/Title?variant=en-bar",
      "talk": "https://en.wikipedia.org/wiki/Talk:Title?variant=en-bar"
    }
  },
  "en-bar": {
    "display_title": "Title",
    "content_urls": {
      "page": "https://en.wikipedia.org/wiki/Title?variant=en-foo",
      "talk": "https://en.wikipedia.org/wiki/Talk:Title?variant=en-foo"
    }
  }
}

Is this task now invalid? I know there is a patch in Gerrit (https://gerrit.wikimedia.org/r/#/c/mediawiki/services/mobileapps/+/439997/) to support accept-language headers when making internal MW API requests that I haven't had a chance to review, but will now.

@Fjalapeno Ha, you beat me to commenting (only barely).

@MSantos @Mholloway Just following up on comments from yesterday:

  1. Question: How is this being done before variants are supported in the RESTBase API? I assumed we were waiting on this until that work is complete.

REST API URLs aren't provided since the preferred variant is specified in a request header and the URL remains the same in all cases.

  1. Normalizing responses was being talked about in several contexts, like Disambiguation pages in summaries. One fall out of this is using things like Summaries whenever possible. This seems like a good candidate for this idea… meaning that each sub-dictionary for each variant should probably be a summary is possible. Is there anything preventing this?

As best I can recall, I think the problem was that page summaries have become very large, so including one or more summaries for each variant would add quite a bit of bulk to the metadata response. But yes, that could be done.

LGoto added a subscriber: LGoto.Jul 3 2018, 5:25 PM

Hi @MSantos Could you add some notes on what's blocking this, and who needs to do what in order to unblock it? Thanks!

Sure, @LGoto. Thanks for reminding me.

There is a patch for review, but @Fjalapeno raised the two questions above that is under discussion with @Mholloway. That could change the original specs, adding summaries for each variant.

Once it is solved we could either merge the latest patch or restart development with new specs.

Change 444748 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Metadata: Variant info groundwork

https://gerrit.wikimedia.org/r/444748

Change 444748 abandoned by Mholloway:
Metadata: Variant info groundwork

https://gerrit.wikimedia.org/r/444748

This ticket is now unblocked.

After discussing with the team we conclude that:

1- Adding summaries for each variant could lead to a very large API response, which is not desirable for these mobile endpoints.
2- The content_urls is not useful for the client. The app wouldn't have much to do with this data and if needed this can be easily generated by the client once it will have access to all variants "code language", e. g., en-foo or en-bar.
3- The display_title can still be useful.

The endpoint should now give responses like:

"variants": {
  "en-foo": {
    "display_title": "Title Foo",
  },
  "en-bar": {
    "display_title": "Title Bar",
  }
}

Change 440977 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Metadata: Variant info groundwork

https://gerrit.wikimedia.org/r/440977

Change 441517 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Metadata: Add per-variant display_title

https://gerrit.wikimedia.org/r/441517

Stashbot added a subscriber: Stashbot.

Mentioned in SAL (#wikimedia-operations) [2018-07-11T20:16:33Z] <bsitzmann@deploy1001> Started deploy [mobileapps/deploy@03fa731]: Update mobileapps to b5e152d (T195325 T189830 T177619 T196523)

Mentioned in SAL (#wikimedia-operations) [2018-07-11T20:20:04Z] <bsitzmann@deploy1001> Finished deploy [mobileapps/deploy@03fa731]: Update mobileapps to b5e152d (T195325 T189830 T177619 T196523) (duration: 03m 30s)

Mentioned in SAL (#wikimedia-operations) [2018-07-11T22:18:16Z] <bsitzmann@deploy1001> Started deploy [mobileapps/deploy@03fa731]: Update mobileapps to b5e152d (T195325 T189830 T177619 T196523)

Mentioned in SAL (#wikimedia-operations) [2018-07-11T22:25:00Z] <bsitzmann@deploy1001> Finished deploy [mobileapps/deploy@03fa731]: Update mobileapps to b5e152d (T195325 T189830 T177619 T196523) (duration: 06m 44s)

Hey @cooltey. We are starting to add proper variant support for the wiki content on the REST services. I chatted with Charlotte and she mentioned you had dealt quite a bit with it, so I'd like for you to be aware of this.

For now, the metadata service for a title, will return you a variants key with the available variants for the page, and the titles. With this clients should be able to show some sort of UI or merge this into the languages selector to enable users to change between variants.

Can you have a look and see if this would fit your use case? Selecting a variant will be done by sending the language on the accept-language header with the request.