Return common URLs in summary API so clients do not have to perform bug prone string manipulation
Closed, ResolvedPublic
Actions

Description

Originally discussed here:
T169761#3422369

This is pushing forward the notion of returning the data that clients need and remove the need for client side processing.

In particular, clients create a small number of URLs from titles. @phuedx proposed a dictionary like (I have made a few changes/additions):

content_urls: {
  page: "…",
  revisions: "..."
  editor: "…",
  talk_page: "...",
},
api_urls: {
  summary: "...", //The URL to this response
  mobile_sections: "...",  //return for now, but will be deprecated
  read_html: "...", //new PCS endpoint 
  content_html: "...", //new PCS end point
  metadata: "…", //new PCS end point
  references: "…",  //new PCS end point
  media: "…",  //new PCS end point
  revisions: "...",
  edit_html: "..." //Parsoid HTML url
  talk_page_html: "...", //Parsoid HTML url for the talk page
},

This seems like a good path forward. We need to bike shed on a few specifics, and should do that here. Some things to consider for the final structure:

Clients should be able to differentiate between content URLs and API URLs easily
We should decide on using mobile vs desktop URLs or both and clearly marking them up
We need to ensure that clients will not be redirected using the URLs we provide.

Details

Subject	Repo	Branch	Lines +/-
Summary 2.0: Add mobile content URLs	mediawiki/services/mobileapps	master	+35 -15
Add base mobile site to siteinfo	mediawiki/extensions/MobileFrontend	master	+14 -0
Add common URLs to summary API	mediawiki/services/mobileapps	master	+179 -51

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T169242 Develop Page Content Service for Reading Clients
Resolved	None	T177425 Develop General Layer of PCS
Resolved	• Jhernandez	T177426 Develop structured JSON APIs for general consumption
Resolved	• Mholloway	T177431 Develop a Summary JSON API
Resolved	Dereckson	T68374 Enable Hovercards on se.wikimedia.org (Swedish chapter wiki)
Resolved	Jdlrobson	T70860 [GOAL] Graduate Page Previews feature (Popups extension) out of Beta Feature
Resolved	ovasileva	T154635 [EPIC] Deploy page previews to English and German Wikipedia
Resolved	ovasileva	T192622 [EPIC] Page previews post-deploy cleanup
Resolved	Jdlrobson	T173952 Remove A/B testing instrumentation code
Duplicate	None	T167433 Switch all projects to the new (and yet to be built) summary-html endpoint for page previews
Duplicate	None	T167429 Make enwiki and dewiki fetch previews from the summary-html RESTBase endpoint
Resolved	ovasileva	T165018 Page previews can consume new summary-HTML endpoint
Declined	Jdlrobson	T111329 [GOAL] Page previews on mobileweb
Resolved	Jdlrobson	T164010 [EPIC] Strengthen the APIs we provide in reading web maintained extensions
Resolved	ovasileva	T113094 [EPIC] The Page Summary API needs to provide useful content for the majority of articles
Resolved	• Fjalapeno	T167022 Inventory requirements for summary 2.0 endpoint across Reading teams
Resolved	phuedx	T168400 [EPIC] Add notion of emptiness to page summary API
Declined	pmiazga	T168328 Extracts for Wikimedia List articles display partial previews
Declined	None	T168391 Page previews should display generic preview for disambiguation pages
Resolved	phuedx	T114418 Disable TextExtracts on file pages
Declined	None	T168418 Page Previews should accept the service's notion of emptiness
Resolved	• Pchelolo	T164291 Make title-related properties consistent
Resolved	• bearND	T169761 Review Summary 2.0 Spec
Resolved	• Mholloway	T170692 Return common URLs in summary API so clients do not have to perform bug prone string manipulation
Resolved	MSantos	T177619 Return variant URLs and titles in the metadata response
Resolved	• Mholloway	T178446 Expose display titles for a page in all available language variants through the action API

Event Timeline

• Fjalapeno created this task.Jul 14 2017, 4:29 PM

This looks exactly like the concept of REST HATEOAS - the response here is the driver for the client to discover more APIs.

The general convention is to wrap all the links in the object under the _links key. But after that the conventions end. There's a ton of different conventions on how to structure the data under the _links key. I propose to just pick one and stick with it.

• Fjalapeno removed • Fjalapeno as the assignee of this task.Jul 14 2017, 4:57 PM

• Fjalapeno edited projects, added Product-Infrastructure-Team-Backlog-Deprecated; removed Product-Infrastructure-Team-Backlog-Deprecated (Kanban).

• Fjalapeno moved this task from Needs triage to Tracking on the Product-Infrastructure-Team-Backlog-Deprecated board.

+1 to _links. We were considering to use that for REST API listings at some point too. Take a look at HAL - an attempt to standardise resource links in JSON responses.

@phuedx ping - just an FYI for your spec.

• Fjalapeno updated the task description. (Show Details)Jul 17 2017, 7:00 PM

I'm up for using HAL – I've used it before in side-projects with some success.

However, having just one HAL-compliant service might seem a little odd. To ease both the implementation of the Page Summary API and the upgrade of existing MCS APIs, should we adopt a library for generating HAL-compliant responses? There are plenty.

In T170692#3447252, @phuedx wrote:

I'm up for using HAL – I've used it before in side-projects with some success.

However, having just one HAL-compliant service might seem a little odd. To ease both the implementation of the Page Summary API and the upgrade of existing MCS APIs, should we adopt a library for generating HAL-compliant responses?

How do you see other end points benefiting from HAL? Could you elaborate a bit?

In T170692#3449304, @mobrovac wrote:

How do you see other end points benefiting from HAL? Could you elaborate a bit?

I wouldn't say that there's any clear benefit as service authors. As a service client, however, I think that I'd benefit from having a family of services that respond in a consistent format. Both the _links part of the HAL spec and this task are dealing with the discovery of related resources by the client. If only one endpoint were to list related resources, then it's harder for the client to discover them as they have to fetch that resource first. That wouldn't be the case if all resources were to list related resources.

The above being said, neither the apps nor the mobile site are generic clients. They'll likely know which resources they want to request.

For that reason, I think consistency of responses should be a goal for all teams that create services – and that it's likely on the services team to coordinate that effort – but I don't think it blocks near-term service work (the Page Summary API and the move from MCS to PCS).

I spoke with @phuedx about this topic. Just to document here:

I am primarily concerned with limiting the string manipulation on the clients as much as possible. If conforming to such a spec makes this easier, that works for me. Consistency is always nice, though I am largely ambivalent to how it happens.

I do like grouping everything under "_links" - that seems very straightforward.

I am also not opposed to having something similar in other APIs that we are designing now. I will have some JSON structures we can bike shed over soonish and see if and how this fits in.

• Fjalapeno updated the task description. (Show Details)Sep 18 2017, 4:09 PM

@phuedx we spoke briefly a while ago about how we were conflating API urls and content urls in the dictionary. I updated the description to divide these 2 types of urls into their own dictionaries. This provides some clarity I think. Of course this isn't HAL compliment, but like you said we aren't really doing this anywhere else.

I know the development is already in progress and close to completion, but do you think it would be ok to make this change before merging? (Or if not, can we omit the URLs dictionary until we resolve so we don't have to make a breaking change?)

• Fjalapeno updated the task description. (Show Details)Sep 18 2017, 4:14 PM

In T170692#3615178, @Fjalapeno wrote:

@phuedx we spoke briefly a while ago about how we were conflating API urls and content urls in the dictionary. I updated the description to divide these 2 types of urls into their own dictionaries. This provides some clarity I think. Of course this isn't HAL compliment, but like you said we aren't really doing this anywhere else.

I know the development is already in progress and close to completion, but do you think it would be ok to make this change before merging? (Or if not, can we omit the URLs dictionary until we resolve so we don't have to make a breaking change?)

I assume we're talking about the Page Summary endpoint. If that's the case, then I think that we can make this change before switching out the current RESTBase endpoint with the new service.

phuedx added a parent task: T113094: [EPIC] The Page Summary API needs to provide useful content for the majority of articles.Sep 26 2017, 3:05 PM

@phuedx yep, talking about the summary (sorry missed this)

@Mholloway is also looking at this ticket as the android team just did a bunch of work to figure out variant URLs. So we will file a subtask to:

Make sure these URLs are the right ones for pages that have/are variants
Add an additional dictionary for variant URLs

cc @Dbrant @cooltey

• Fjalapeno created subtask T177619: Return variant URLs and titles in the metadata response.Oct 6 2017, 4:02 PM

• Fjalapeno added a project: Page Content Service.

Change 384738 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Add common URLs to summary API

https://gerrit.wikimedia.org/r/384738

gerritbot added a project: Patch-For-Review.Oct 17 2017, 6:37 PM

• Mholloway claimed this task.Oct 17 2017, 6:38 PM

• Mholloway moved this task from Tracking to Kanban on the Product-Infrastructure-Team-Backlog-Deprecated board.

• Mholloway edited projects, added Product-Infrastructure-Team-Backlog-Deprecated (Kanban); removed Product-Infrastructure-Team-Backlog-Deprecated.

• Mholloway moved this task from To Do to Doing on the Product-Infrastructure-Team-Backlog-Deprecated (Kanban) board.

• Mholloway moved this task from Doing to Code Review on the Product-Infrastructure-Team-Backlog-Deprecated (Kanban) board.Oct 17 2017, 6:50 PM

Couple of follow-up questions on this one:

Is "revisions" in content_urls meant to be the article's revision history page (e.g., https://en.wikipedia.org/w/index.php?title=%CE%92-lactam_antibiotic&action=history)? If so, should we call it "history" for consistency's sake?

What is the "revisions" in api_urls meant to be? I couldn't find anything corresponding to it in the spec or the tech doc.

I've got a few questions around the need for content_urls. The api_urls seem more easy to grasp. I'm still a bit curious whether these are actually needed in the response. Surely, the client can easily work out these particular URLs and may choose to ignore them altogether by providing it's own experience?

Here's my questions around the current patch that may need some fleshing out:

Why are some URLs canonical but the others are not?

A few notes

/wiki path is configurable
/w/index.php URLS are not canonical. https://en.wikipedia.org/wiki/US_war_crimes?action=history for example.
On mobile the history url will redirect to Special:History.

e.g. https://en.m.wikipedia.org/wiki/Special:History/US_war_crimes

this in turn will redirect again when translated on a non-English wiki

e.g. https://fr.m.wikipedia.org/wiki/Sp%C3%A9cial:History/Atractus_collaris

On mobile the edit url often points to a very bad experience.

@Fjalapeno In addition to the above questions, did we want to change the gallery to media here as well?

In T170692#3724643, @bearND wrote:

@Fjalapeno In addition to the above questions, did we want to change the gallery to media here as well?

Yep changed the gallery to media…

In T170692#3694693, @Jdlrobson wrote:

Here's my questions around the current patch that may need some fleshing out:

Why are some URLs canonical but the others are not?

@Jdlrobson Thanks good point, @Mholloway has changed everything to be canonical now

looking into other things like history…

In T170692#3694693, @Jdlrobson wrote:

Surely, the client can easily work out these particular URLs and may choose to ignore them altogether by providing it's own experience?

@Jdlrobson This question goes to the title of the ticket "Return common URLs in summary API so clients do not have to perform bug prone string manipulation"

While URL construction can be worked out, we have shown time and time again that clients will create URLs incorrectly, double encoding titles or not knowing quite what the URL format is. Or as you have pointed out even on this ticket, not creating a canonical URL and causing unnecessary redirects.

The point of this ticket is to prevent those issues by just giving these URLs to the clients. The URLs are constructed once, and can be fixed in a single place. This goes for both content and api URLs.

And while a client may provide it's own experience, some may want to use a web version (like for editing or talk pages). This is also a general purpose API, and so we want to support other use cases (like for a website that may want to just send users to Wikipedia to contribute to a page).

Hope that explains the thinking here. If you have more questions feel free to hit me up.

From my perspective this would be handled better by a dedicated node library (similar to mediawiki-title) since you can construct those urls from information you already have. On the other hand doing it in the server will add a bloat to all responses and any bugs we do miss for special pages will be subject to caching.

The issue with resolving to canonical urls is going to be tricky on both server and client. Im not sure how to avoid multiple redirects from example in the case of the history link.

That said I don't feel too strongly about what you do here I just wanted to provide this opinion as I'm not 100% convinced. I guess if we decide to use a node library later on it is easy enough to remove fields.

In T170692#3737565, @Jdlrobson wrote:

From my perspective this would be handled better by a dedicated node library (similar to mediawiki-title) since you can construct those urls from information you already have.

Should we also be providing libraries for other common languages to generate these URLs? By making the server provide them, you'd get rid of the need for such dedicated libraries.

On the other hand doing it in the server will add a bloat to all responses and any bugs we do miss for special pages will be subject to caching.

We can definitely test this hypothesis. My guess would be that gzip would remove the majority of it.

Change 384738 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Add common URLs to summary API

https://gerrit.wikimedia.org/r/384738

In T170692#3737565, @Jdlrobson wrote:

From my perspective this would be handled better by a dedicated node library (similar to mediawiki-title) since you can construct those urls from information you already have. On the other hand doing it in the server will add a bloat to all responses…

@Jdlrobson I''m reading this as meaning:
"Including all versions of titles and URLs that a client would need in the summary response would cause clients to have to use significantly more data to get a summary."

to build on @phuedx's comment:
Gzip is very efficient at compressing information with similar content, like these URLs and Titles which largely include the same characters. This was discussed in a couple Services syncs and it was determined that the actual number of bytes add to support these features is nearly 0 due to compression. So in fact this should not really be a problem.

With this in mind, including the URLs may allow us to reduce bloat in the clients be removing their need to include extra JS libraries (including the mediawiki-title) in their source.

…and any bugs we do miss for special pages will be subject to caching.

The issue with resolving to canonical urls is going to be tricky on both server and client.

If we have a problem with special pages / canonical URLS, etc, then by only implementing this in one place, we only need to fix it one place (including 3rd party clients we don't know about) - so I think this is actually a win.

Im not sure how to avoid multiple redirects from example in the case of the history link.

Yeah we don't want to do this for sure… Michael is working on this specific issue. And we think we have solutions to avoid this (I'll also update the ticket with a requirement about having 0 redirects from URLs provided in the summary)

That said I don't feel too strongly about what you do here I just wanted to provide this opinion as I'm not 100% convinced. I guess if we decide to use a node library later on it is easy enough to remove fields.

Thanks really, and your points are noted… and if somehow Gzip can't compress these responses properly, then of course we can fall back to including node libraries in clients.

• Fjalapeno updated the task description. (Show Details)Nov 6 2017, 8:52 PM

• bearND moved this task from Code Review to To Deploy on the Product-Infrastructure-Team-Backlog-Deprecated (Kanban) board.Nov 6 2017, 8:56 PM

I'm not sure what the correct URL for mobile editing should be. What happens on mobile seems to depend on the user's logged-in state and, if logged in, the user's preference about Visual Editor usage. Does a pre-defined edit URL make sense on mobile?

Change 389584 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Summary 2.0: Add mobile content URLs

https://gerrit.wikimedia.org/r/389584

Mentioned in SAL (#wikimedia-operations) [2017-11-08T21:37:28Z] <bsitzmann@tin> Started deploy [mobileapps/deploy@00e60b2]: Update mobileapps to 8e82983 (T178706 T178708 T178333 T170692)

Mentioned in SAL (#wikimedia-operations) [2017-11-08T21:44:40Z] <bsitzmann@tin> Finished deploy [mobileapps/deploy@00e60b2]: Update mobileapps to 8e82983 (T178706 T178708 T178333 T170692) (duration: 07m 12s)

• bearND moved this task from To Deploy to Sign off on the Product-Infrastructure-Team-Backlog-Deprecated (Kanban) board.Nov 8 2017, 9:48 PM

• bearND closed this task as Resolved.Nov 9 2017, 3:15 AM

Change 390248 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/extensions/MobileFrontend@master] Add base mobile site to siteinfo

https://gerrit.wikimedia.org/r/390248

Change 390248 merged by jenkins-bot:
[mediawiki/extensions/MobileFrontend@master] Add base mobile site to siteinfo

https://gerrit.wikimedia.org/r/390248

ReleaseTaggerBot added a project: MW-1.31-release-notes (WMF-deploy-2017-11-14 (1.31.0-wmf.8)).Nov 9 2017, 11:00 PM

Change 389584 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Summary 2.0: Add mobile content URLs

https://gerrit.wikimedia.org/r/389584

• Jhernandez closed subtask T177619: Return variant URLs and titles in the metadata response as Resolved.Jul 12 2018, 10:47 AM

Return common URLs in summary API so clients do not have to perform bug prone string manipulation Closed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Return common URLs in summary API so clients do not have to perform bug prone string manipulation
Closed, ResolvedPublic
Actions

Related Objects
Search...