Page MenuHomePhabricator

Return common URLs in summary API so clients do not have to perform bug prone string manipulation
Closed, ResolvedPublic

Description

Originally discussed here:
T169761#3422369

and here:
T169761#3424801

This is pushing forward the notion of returning the data that clients need and remove the need for client side processing.

In particular, clients create a small number of URLs from titles. @phuedx proposed a dictionary like (I have made a few changes/additions):

content_urls: {
  page: "…",
  revisions: "..."
  editor: "…",
  talk_page: "...",
},
api_urls: {
  summary: "...", //The URL to this response
  mobile_sections: "...",  //return for now, but will be deprecated
  read_html: "...", //new PCS endpoint 
  content_html: "...", //new PCS end point
  metadata: "…", //new PCS end point
  references: "…",  //new PCS end point
  media: "…",  //new PCS end point
  revisions: "...",
  edit_html: "..." //Parsoid HTML url
  talk_page_html: "...", //Parsoid HTML url for the talk page
},

This seems like a good path forward. We need to bike shed on a few specifics, and should do that here. Some things to consider for the final structure:

  1. Clients should be able to differentiate between content URLs and API URLs easily
  2. We should decide on using mobile vs desktop URLs or both and clearly marking them up
  3. We need to ensure that clients will not be redirected using the URLs we provide.

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedNone
Resolved Jhernandez
Resolved Mholloway
ResolvedDereckson
ResolvedJdlrobson
Resolvedovasileva
Resolvedovasileva
ResolvedJdlrobson
DuplicateNone
DuplicateNone
Resolvedovasileva
DeclinedJdlrobson
ResolvedJdlrobson
Resolvedovasileva
Resolved Fjalapeno
Resolvedphuedx
Declinedpmiazga
DeclinedNone
Resolvedphuedx
DeclinedNone
Resolved Pchelolo
Resolved bearND
Resolved Mholloway
ResolvedMSantos
Resolved Mholloway

Event Timeline

This looks exactly like the concept of REST HATEOAS - the response here is the driver for the client to discover more APIs.

The general convention is to wrap all the links in the object under the _links key. But after that the conventions end. There's a ton of different conventions on how to structure the data under the _links key. I propose to just pick one and stick with it.

+1 to _links. We were considering to use that for REST API listings at some point too. Take a look at HAL - an attempt to standardise resource links in JSON responses.

I'm up for using HAL – I've used it before in side-projects with some success.

However, having just one HAL-compliant service might seem a little odd. To ease both the implementation of the Page Summary API and the upgrade of existing MCS APIs, should we adopt a library for generating HAL-compliant responses? There are plenty.

I'm up for using HAL – I've used it before in side-projects with some success.

However, having just one HAL-compliant service might seem a little odd. To ease both the implementation of the Page Summary API and the upgrade of existing MCS APIs, should we adopt a library for generating HAL-compliant responses?

How do you see other end points benefiting from HAL? Could you elaborate a bit?

How do you see other end points benefiting from HAL? Could you elaborate a bit?

I wouldn't say that there's any clear benefit as service authors. As a service client, however, I think that I'd benefit from having a family of services that respond in a consistent format. Both the _links part of the HAL spec and this task are dealing with the discovery of related resources by the client. If only one endpoint were to list related resources, then it's harder for the client to discover them as they have to fetch that resource first. That wouldn't be the case if all resources were to list related resources.

The above being said, neither the apps nor the mobile site are generic clients. They'll likely know which resources they want to request.

For that reason, I think consistency of responses should be a goal for all teams that create services – and that it's likely on the services team to coordinate that effort – but I don't think it blocks near-term service work (the Page Summary API and the move from MCS to PCS).

I spoke with @phuedx about this topic. Just to document here:

I am primarily concerned with limiting the string manipulation on the clients as much as possible. If conforming to such a spec makes this easier, that works for me. Consistency is always nice, though I am largely ambivalent to how it happens.

I do like grouping everything under "_links" - that seems very straightforward.

I am also not opposed to having something similar in other APIs that we are designing now. I will have some JSON structures we can bike shed over soonish and see if and how this fits in.

@phuedx we spoke briefly a while ago about how we were conflating API urls and content urls in the dictionary. I updated the description to divide these 2 types of urls into their own dictionaries. This provides some clarity I think. Of course this isn't HAL compliment, but like you said we aren't really doing this anywhere else.

I know the development is already in progress and close to completion, but do you think it would be ok to make this change before merging? (Or if not, can we omit the URLs dictionary until we resolve so we don't have to make a breaking change?)

@phuedx we spoke briefly a while ago about how we were conflating API urls and content urls in the dictionary. I updated the description to divide these 2 types of urls into their own dictionaries. This provides some clarity I think. Of course this isn't HAL compliment, but like you said we aren't really doing this anywhere else.

I know the development is already in progress and close to completion, but do you think it would be ok to make this change before merging? (Or if not, can we omit the URLs dictionary until we resolve so we don't have to make a breaking change?)

I assume we're talking about the Page Summary endpoint. If that's the case, then I think that we can make this change before switching out the current RESTBase endpoint with the new service.

@phuedx yep, talking about the summary (sorry missed this)

@Mholloway is also looking at this ticket as the android team just did a bunch of work to figure out variant URLs. So we will file a subtask to:

  1. Make sure these URLs are the right ones for pages that have/are variants
  2. Add an additional dictionary for variant URLs

cc @Dbrant @cooltey

Change 384738 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Add common URLs to summary API

https://gerrit.wikimedia.org/r/384738

Couple of follow-up questions on this one:

Is "revisions" in content_urls meant to be the article's revision history page (e.g., https://en.wikipedia.org/w/index.php?title=%CE%92-lactam_antibiotic&action=history)? If so, should we call it "history" for consistency's sake?

What is the "revisions" in api_urls meant to be? I couldn't find anything corresponding to it in the spec or the tech doc.

I've got a few questions around the need for content_urls. The api_urls seem more easy to grasp. I'm still a bit curious whether these are actually needed in the response. Surely, the client can easily work out these particular URLs and may choose to ignore them altogether by providing it's own experience?

Here's my questions around the current patch that may need some fleshing out:

  • Why are some URLs canonical but the others are not?

A few notes

e.g. https://en.m.wikipedia.org/wiki/Special:History/US_war_crimes

  • this in turn will redirect again when translated on a non-English wiki

e.g. https://fr.m.wikipedia.org/wiki/Sp%C3%A9cial:History/Atractus_collaris

  • On mobile the edit url often points to a very bad experience.

@Fjalapeno In addition to the above questions, did we want to change the gallery to media here as well?

@Fjalapeno In addition to the above questions, did we want to change the gallery to media here as well?

Yep changed the gallery to media…

Here's my questions around the current patch that may need some fleshing out:

  • Why are some URLs canonical but the others are not?

@Jdlrobson Thanks good point, @Mholloway has changed everything to be canonical now

looking into other things like history…

Surely, the client can easily work out these particular URLs and may choose to ignore them altogether by providing it's own experience?

@Jdlrobson This question goes to the title of the ticket "Return common URLs in summary API so clients do not have to perform bug prone string manipulation"

While URL construction can be worked out, we have shown time and time again that clients will create URLs incorrectly, double encoding titles or not knowing quite what the URL format is. Or as you have pointed out even on this ticket, not creating a canonical URL and causing unnecessary redirects.

The point of this ticket is to prevent those issues by just giving these URLs to the clients. The URLs are constructed once, and can be fixed in a single place. This goes for both content and api URLs.

And while a client may provide it's own experience, some may want to use a web version (like for editing or talk pages). This is also a general purpose API, and so we want to support other use cases (like for a website that may want to just send users to Wikipedia to contribute to a page).

Hope that explains the thinking here. If you have more questions feel free to hit me up.

From my perspective this would be handled better by a dedicated node library (similar to mediawiki-title) since you can construct those urls from information you already have. On the other hand doing it in the server will add a bloat to all responses and any bugs we do miss for special pages will be subject to caching.

The issue with resolving to canonical urls is going to be tricky on both server and client. Im not sure how to avoid multiple redirects from example in the case of the history link.

That said I don't feel too strongly about what you do here I just wanted to provide this opinion as I'm not 100% convinced. I guess if we decide to use a node library later on it is easy enough to remove fields.

From my perspective this would be handled better by a dedicated node library (similar to mediawiki-title) since you can construct those urls from information you already have.

Should we also be providing libraries for other common languages to generate these URLs? By making the server provide them, you'd get rid of the need for such dedicated libraries.

On the other hand doing it in the server will add a bloat to all responses and any bugs we do miss for special pages will be subject to caching.

We can definitely test this hypothesis. My guess would be that gzip would remove the majority of it.

Change 384738 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Add common URLs to summary API

https://gerrit.wikimedia.org/r/384738

From my perspective this would be handled better by a dedicated node library (similar to mediawiki-title) since you can construct those urls from information you already have. On the other hand doing it in the server will add a bloat to all responses…

@Jdlrobson I''m reading this as meaning:
"Including all versions of titles and URLs that a client would need in the summary response would cause clients to have to use significantly more data to get a summary."

to build on @phuedx's comment:
Gzip is very efficient at compressing information with similar content, like these URLs and Titles which largely include the same characters. This was discussed in a couple Services syncs and it was determined that the actual number of bytes add to support these features is nearly 0 due to compression. So in fact this should not really be a problem.

With this in mind, including the URLs may allow us to reduce bloat in the clients be removing their need to include extra JS libraries (including the mediawiki-title) in their source.

…and any bugs we do miss for special pages will be subject to caching.

The issue with resolving to canonical urls is going to be tricky on both server and client.

If we have a problem with special pages / canonical URLS, etc, then by only implementing this in one place, we only need to fix it one place (including 3rd party clients we don't know about) - so I think this is actually a win.

Im not sure how to avoid multiple redirects from example in the case of the history link.

Yeah we don't want to do this for sure… Michael is working on this specific issue. And we think we have solutions to avoid this (I'll also update the ticket with a requirement about having 0 redirects from URLs provided in the summary)

That said I don't feel too strongly about what you do here I just wanted to provide this opinion as I'm not 100% convinced. I guess if we decide to use a node library later on it is easy enough to remove fields.

Thanks really, and your points are noted… and if somehow Gzip can't compress these responses properly, then of course we can fall back to including node libraries in clients.

I'm not sure what the correct URL for mobile editing should be. What happens on mobile seems to depend on the user's logged-in state and, if logged in, the user's preference about Visual Editor usage. Does a pre-defined edit URL make sense on mobile?

Change 389584 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Summary 2.0: Add mobile content URLs

https://gerrit.wikimedia.org/r/389584

Mentioned in SAL (#wikimedia-operations) [2017-11-08T21:44:40Z] <bsitzmann@tin> Finished deploy [mobileapps/deploy@00e60b2]: Update mobileapps to 8e82983 (T178706 T178708 T178333 T170692) (duration: 07m 12s)

Change 390248 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/extensions/MobileFrontend@master] Add base mobile site to siteinfo

https://gerrit.wikimedia.org/r/390248

Change 390248 merged by jenkins-bot:
[mediawiki/extensions/MobileFrontend@master] Add base mobile site to siteinfo

https://gerrit.wikimedia.org/r/390248

Change 389584 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Summary 2.0: Add mobile content URLs

https://gerrit.wikimedia.org/r/389584