Page MenuHomePhabricator

Develop a Summary JSON API
Closed, ResolvedPublic

Description

Note: this task is being implemented on the web team and the scope is totally defined in the subtask. This task serves as a marker for how it is integrated into the PCS as a whole.

At a high level, this is tracking the work being performed to create the next version of the RESTBase Summary API here: https://en.wikipedia.org/api/rest_v1/#!/Page_content/get_page_summary_title

The primary new features of the end point include:

  1. Normalization of title properties
  2. URLs for commonly needed content
  3. An improved extract of the page
  4. Handling of summaries for "not normal" pages (i.e. disambiguation pages)

Current example output:

{
  "type": "standard",
  "title": "Laredo, Texas",
  "displaytitle": "Laredo, Texas",
  "namespace": {
    "id": 0,
    "text": ""
  },
  "titles": {
    "canonical": "Laredo,_Texas",
    "normalized": "Laredo, Texas",
    "display": "Laredo, Texas",
  },
  "pageid": 136773,
  "thumbnail": {
    "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/9/91/Webb_County_Laredo.svg/320px-Webb_County_Laredo.svg.png",
    "width": 320,
    "height": 238
  },
  "originalimage": {
    "source": "https://upload.wikimedia.org/wikipedia/commons/9/91/Webb_County_Laredo.svg",
    "width": 982,
    "height": 730
  },
  "lang": "en",
  "dir": "ltr",
  "revision": "808369737",
  "tid": "808ed2b1-c464-11e7-84fe-e94e48c317f5",
  "timestamp": "2017-11-02T13:26:55Z",
  "description": "border city in Texas, USA",
  "coordinates": {
    "lat": 27.524445,
    "lon": -99.490593
  },
  "content_urls": {
    "desktop": {
      "page": "https://en.wikipedia.org/wiki/Laredo,_Texas",
      "revisions": "https://en.wikipedia.org/wiki/Laredo,_Texas?action=history",
      "edit": "https://en.wikipedia.org/wiki/Laredo,_Texas?action=edit",
      "talk": "https://en.wikipedia.org/wiki/Talk:Laredo,_Texas"
    },
    "mobile": {
      "page": "https://en.m.wikipedia.org/wiki/Laredo,_Texas",
      "revisions": "https://en.m.wikipedia.org/wiki/Special:History/Laredo,_Texas",
      "edit": "https://en.m.wikipedia.org/wiki/Laredo,_Texas?action=edit",
      "talk": "https://en.m.wikipedia.org/wiki/Talk:Laredo,_Texas"
    }
  },
  "api_urls": {
    "summary": "https://en.wikipedia.org/api/rest_v1/page/summary/Laredo,_Texas",
    "mobile_sections": "https://en.wikipedia.org/api/rest_v1/page/mobile-sections/Laredo,_Texas",
    "edit_html": "https://en.wikipedia.org/api/rest_v1/page/html/Laredo,_Texas",
    "talk_page_html": "https://en.wikipedia.org/api/rest_v1/page/html/Talk:Laredo,_Texas"
  },
  "extract": "Laredo is the county seat of Webb County, Texas, United States, located on the north bank of the Rio Grande in South Texas, across from Nuevo Laredo, Tamaulipas, Mexico. According to the 2010 census, the city population was 236,091, making it the tenth-most populous city in the state of Texas and third-most populated on the Mexico–United States border, after San Diego, California, and El Paso, Texas. Its metropolitan area is the 178th-largest in the \nU.S. and includes all of Webb County, with a population of 250,304. Laredo is also part of the cross-border Laredo-Nuevo Laredo Metropolitan Area with an estimated population of 636,516.",
  "extract_html": "<p><b>Laredo</b> is the <span>county seat</span> of <span>Webb County, Texas</span>, United States, located on the north bank of the <span>Rio Grande</span> in <span>South Texas</span>, across from <span>Nuevo Laredo</span>, <span>Tamaulipas</span>, <span>Mexico</span>. According to the <span>2010 census</span>, the city population was 236,091, making it the <span>tenth-most populous</span> city in the <span>state</span> of <span>Texas</span> and third-most populated on the <span>Mexico–United States border</span>, after <span class=\"mw-redirect\">San Diego, California</span>, and <span>El Paso, Texas</span>. Its <span>metropolitan area</span> is the <span class=\"mw-redirect\">178th-largest in the \nU.S.</span> and includes all of Webb County, with a population of 250,304. Laredo is also part of the cross-border <span class=\"mw-redirect\">Laredo-Nuevo Laredo Metropolitan Area</span> with an estimated population of 636,516.</p>"
}

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedNone
Resolved Jhernandez
Resolved Mholloway
Resolvedovasileva
Resolvedphuedx
Resolvedphuedx
DuplicateNone
Resolved Jdlrobson
Resolved Jdlrobson
DuplicateNone
Duplicateovasileva
Resolvedovasileva
DuplicateNone
DeclinedNone
Duplicate Jdlrobson
ResolvedMhurd
DeclinedJMinor
Resolvedphuedx
Resolved Pchelolo
Resolved Jdlrobson
Declined Pchelolo
Resolvedphuedx
Declined Jdlrobson
DuplicateNone
Resolved Fjalapeno
Resolvedphuedx
Declinedpmiazga
DeclinedNone
Resolvedphuedx
DeclinedNone
Resolved Pchelolo
Resolved bearND
Resolved Mholloway
ResolvedMSantos
Resolved Mholloway
InvalidNone
Resolved Jdlrobson
InvalidNone
DuplicateNone
Resolved Jdlrobson
Resolved Jdlrobson
Resolved Jdlrobson
Resolved Jdlrobson
Resolvedphuedx
Resolved bearND
Resolved Mholloway
DuplicateNone
Resolved Jdlrobson
Resolved Jdlrobson
Resolvedphuedx
Resolved Jdlrobson
Resolved Jdlrobson
Resolved bearND
Resolved Jdlrobson
Resolved Mholloway
Resolved Mholloway
Resolved Jdlrobson
Resolved Jdlrobson
Resolved bearND
Resolved mobrovac
ResolvedABorbaWMF
ResolvedABorbaWMF
Resolved Mholloway
Resolved Mholloway
Resolved Mholloway
Resolved Mholloway
DeclinedNone
Resolved Mholloway
Resolved Mholloway
DeclinedNone
Resolved Pchelolo
Resolved Mholloway

Event Timeline

So the things I see that we need to do (if the response is up to date)

  1. Some URLs need to be hidden until they are deployed:
"read_html": "https://en.wikipedia.org/api/rest_v1/page/read-html/Laredo,_Texas",
"content_html": "https://en.wikipedia.org/api/rest_v1/page/content-html/Laredo,_Texas",
"metadata": "https://en.wikipedia.org/api/rest_v1/page/metadata/Laredo,_Texas",
"references": "https://en.wikipedia.org/api/rest_v1/page/references/Laredo,_Texas",
"media": "https://en.wikipedia.org/api/rest_v1/page/media/Laredo,_Texas",
  1. We need the mobile and desktop URLs in the content_URLs
  1. We need to decide what to do with the namespace properties (inside or outside the titles dictionary?)
  1. We need the time UUID in the response
  1. Bikeshed: should we just remove the trailing "name" from "namespace_name"? It seems redundant.

I think that is mostly it… there is one other topic I'll bring up in a separate comment around the "type" property… just want to get some web team input there. But I'm not sure it is a blocker since it is additive…

@Mholloway can you confirm the comment I just made? Does that seem right as a list of things to do?

@phuedx Hey looking at the spec for type… we have the following:

"disambiguation", "wikidata", or "standard"

I'm thinking we should expand that a bit further after a few recent talks about how we handle other types of pages. Specifically things like "File" pages…

"talk,", "file", "category", "disambiguation", "wikidata", or "standard"

This will let clients in the future decide to handle these pages differently… OR are we just recreating the name space property here?

I guess something bothers me about calling a "category" page "standard".

Thoughts?

@Mholloway @bearND One more bikeshed…

So Talk pages will have their own summary… what does that look like?

I'm assuming it will have the URLS for the HTML and the APIs, so I am wondering if we need to include those URLs here as well. Maybe we just need a link to the Talk Page summary so they can get all the information they need about the talk page if they want it?

Thoughts?

@phuedx Hey looking at the spec for type… we have the following:

"disambiguation", "wikidata", or "standard"

I'm thinking we should expand that a bit further after a few recent talks about how we handle other types of pages. Specifically things like "File" pages…

"talk,", "file", "category", "disambiguation", "wikidata", or "standard"

This will let clients in the future decide to handle these pages differently… OR are we just recreating the [namespace] property here?

I think this goes to the principle behind adding content and API URLs to PCS responses: the server is providing important meta information about the page and/or project and the client is free to do with it what it will. In this case, the server is signalling to the client that it's treated the page differently while generating the preview.

I'd recommend adding more types as and when we need to (i.e. when Design and/or Product call for it).

Change 390252 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Suppress not-yet-deployed REST API URLs

https://gerrit.wikimedia.org/r/390252

So the things I see that we need to do (if the response is up to date)

  1. Some URLs need to be hidden until they are deployed:
"read_html": "https://en.wikipedia.org/api/rest_v1/page/read-html/Laredo,_Texas",
"content_html": "https://en.wikipedia.org/api/rest_v1/page/content-html/Laredo,_Texas",
"metadata": "https://en.wikipedia.org/api/rest_v1/page/metadata/Laredo,_Texas",
"references": "https://en.wikipedia.org/api/rest_v1/page/references/Laredo,_Texas",
"media": "https://en.wikipedia.org/api/rest_v1/page/media/Laredo,_Texas",

https://gerrit.wikimedia.org/r/390252

  1. We need the mobile and desktop URLs in the content_URLs

https://gerrit.wikimedia.org/r/#/c/389584/ (blocked on an addition to siteinfo via MobileFrontend: https://gerrit.wikimedia.org/r/#/c/390248/)

  1. We need to decide what to do with the namespace properties (inside or outside the titles dictionary?)

I vote outside.

We also need to decide once and for all on what to call them (I don't know that we ever got to that yesterday). The already-deployed mobile-sections lead response uses "ns" for numeric namespace. Do we want to follow that for consistency, or use the longer names above in the summary endpoint for clarity, at the price of inconsistency?

  1. We need the time UUID in the response

Will do this a.m.

  1. Bikeshed: should we just remove the trailing "name" from "namespace_name"? It seems redundant.

Yeah, I'm not a big fan of "namespace_name," and yet just "namespace" is ambiguous. Maybe "namespace_text" is more meaningful?

I think that is mostly it… there is one other topic I'll bring up in a separate comment around the "type" property… just want to get some web team input there. But I'm not sure it is a blocker since it is additive…

@Mholloway can you confirm the comment I just made? Does that seem right as a list of things to do?

Yep, I think that covers it.

@Mholloway @bearND One more bikeshed…

So Talk pages will have their own summary… what does that look like?

Per the spec, the summary endpoint is only serving content for titles in "content namespaces," implemented as a whitelist currently containing only the main namespace. For Talk and all other namespaces we're responding with 204s. The last decision on the matter was to postpone work on other namespaces until someone has an actual use case.

Discussion was here, for posterity: T178420: What is a "content namespace" for purposes of the summary 2.0 endpoint?

By the way, did we decide for sure on using .../wiki/Title?action=edit#/editor/0 as the mobile edit URL, even though it refers specifically to the lead section?

Change 390252 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Suppress not-yet-deployed REST API URLs

https://gerrit.wikimedia.org/r/390252

@Mholloway can you confirm the comment I just made? Does that seem right as a list of things to do?

Should separate mobile-sections-lead and mobile-sections-remaining API URLs be added? On the one hand, it would make the list more complete. On the other, I am not sure that clients other than the Android app use those endpoints, and the Android app won't be making use of these fully formed URLs because of how its API interfaces are implemented.

I'm leaning towards removing the reference to mobile-sections since we will consider this a deprecated API once PCS is functional. No need to proliferate usage at this point IMO.

Yeah, the section number in the edit URI is a tricky subject. I'm not aware of any agreement on this or a really satisfying solution for this.
Assuming we keep the URI as is since mobile clients prefer to edit only single sections, I see two main options but they have their drawbacks:

  1. Leave the URI as is, pointing to section 0 initially, and let the client substitute the '0' with whatever section number the client wants to edit.
  2. Use URI template syntax, like {section_id}. There's an RFC for URI templates.

The reason why I don't really like either of them is that in both cases the client needs to be aware of the string substitution. Having said that, I guess #2 is acceptable since we're not really doing HATEOAS for the URIs anyways and the client needs to know which URI key from the dictionary it wants already.

Change 390282 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Use the underlying page HTML tid in page content endpoint etags

https://gerrit.wikimedia.org/r/390282

Change 390283 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Expose tid in the summary 2.0 output

https://gerrit.wikimedia.org/r/390283

  1. We need the mobile and desktop URLs in the content_URLs

https://gerrit.wikimedia.org/r/#/c/389584/ (blocked on an addition to siteinfo via MobileFrontend: https://gerrit.wikimedia.org/r/#/c/390248/)

@Mholloway Question… how may things are we using MobileFrontend for in MCS/PCS?

  1. We need to decide what to do with the namespace properties (inside or outside the titles dictionary?)

I vote outside.

  1. Bikeshed: should we just remove the trailing "name" from "namespace_name"? It seems redundant.

Yeah, I'm not a big fan of "namespace_name," and yet just "namespace" is ambiguous. Maybe "namespace_text" is more meaningful?

Should this be a hash? with 2 values? like namespace {text:"", id:0 }?

We also need to decide once and for all on what to call them (I don't know that we ever got to that yesterday). The already-deployed mobile-sections lead response uses "ns" for numeric namespace. Do we want to follow that for consistency, or use the longer names above in the summary endpoint for clarity, at the price of inconsistency?

clarity…
Also remember mobile sections is being deprecated and going away… so we should keep consistency between the new endpoints, but don't worry about ones that are being replaced.

Should separate mobile-sections-lead and mobile-sections-remaining API URLs be added? On the one hand, it would make the list more complete. On the other, I am not sure that clients other than the Android app use those endpoints, and the Android app won't be making use of these fully formed URLs because of how its API interfaces are implemented.

Lets just add the main one for now… we can look into adding the 2 others later.

We do though want to get the clients to stop making URLs… that's the point. If we add these URLs and the clients are ignoring them then that isn't much use. But maybe we can wait until the PCS is complete for those changes on the clients though

By the way, did we decide for sure on using .../wiki/Title?action=edit#/editor/0 as the mobile edit URL, even though it refers specifically to the lead section?

Wait… do we need to pass the "0"?

I'm leaning towards removing the reference to mobile-sections since we will consider this a deprecated API once PCS is functional. No need to proliferate usage at this point IMO.

Actually this may be the way to go… I'm good with not adding it if you 2 are good

Wait… do we need to pass the "0"?

Well, JavaScript-enabled web mobile clients need to pass in the section number, or the section edit feature would not be used. Instead it would fall back to the old way and try to edit the whole page.

Change 390295 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Remove mobile-sections URL from summary API URLs

https://gerrit.wikimedia.org/r/390295

  1. Bikeshed: should we just remove the trailing "name" from "namespace_name"? It seems redundant.

Yeah, I'm not a big fan of "namespace_name," and yet just "namespace" is ambiguous. Maybe "namespace_text" is more meaningful?

Should this be a hash? with 2 values? like namespace {text:"", id:0 }?

I personally like a flatter structure (and think that leaves less opportunity for client parsing errors), but don't feel strongly about it.

We also need to decide once and for all on what to call them (I don't know that we ever got to that yesterday). The already-deployed mobile-sections lead response uses "ns" for numeric namespace. Do we want to follow that for consistency, or use the longer names above in the summary endpoint for clarity, at the price of inconsistency?

clarity…

I imagine we should also change these names in the formatted and formatted-lead (next-gen mobile-sections) endpoints, too, then?

@Fjalapeno, @bearND: Re T177431#3748630 and T177431#3748749: It might be worth defaulting to the "base" editing experience (no hash fragment in the URL) for now while we think about how we should approach linking to the mobile editing experience. Right now, my suggestion of the base URL plus the hash fragment seems more of a fortunate hack rather than an actual solution.

Change 390298 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Move namespace_id and namespace_text to top level

https://gerrit.wikimedia.org/r/390298

I imagine we should also change these names in the formatted and formatted-lead (next-gen mobile-sections) endpoints, too, then?

Those end points are superseded by the PCS endpoints…
The work from them has already largely been incorporated into the Read HTML endpoints (@bearND I think you have most of those improvements already right?)

We may need to do something here with a lead section again (Potentially Marvin needs this if they want to support no downloading the entire article). But we are waiting for the product requirements to be finalized here before building an endpoint.

@Fjalapeno, @bearND: Re T177431#3748630 and T177431#3748749: It might be worth defaulting to the "base" editing experience (no hash fragment in the URL) for now while we think about how we should approach linking to the mobile editing experience. Right now, my suggestion of the base URL plus the hash fragment seems more of a fortunate hack rather than an actual solution.

This seems pragmatic to me as well

@Mholloway something I just noticed… sorry for all the last minute things…

"titles": {
  "title": "Laredo,_Texas",
  "normalized_title": "Laredo, Texas",
  "display_title": "Laredo, Texas",
},

Lets get rid of duplication of "title":

"titles": {
  "canonical": "Laredo,_Texas",
  "normalized": "Laredo, Texas",
  "display": "Laredo, Texas",
},

Change 390309 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Clean up titles object keys

https://gerrit.wikimedia.org/r/390309

Change 390298 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Move namespace_id and namespace_text to top level

https://gerrit.wikimedia.org/r/390298

Change 390309 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Clean up titles object keys

https://gerrit.wikimedia.org/r/390309

Change 390282 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Use the underlying page HTML tid in page content endpoint etags

https://gerrit.wikimedia.org/r/390282

Change 390283 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Expose tid in the summary 2.0 output

https://gerrit.wikimedia.org/r/390283

Change 390295 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Remove mobile-sections URL from summary API URLs

https://gerrit.wikimedia.org/r/390295

  1. We need the mobile and desktop URLs in the content_URLs

https://gerrit.wikimedia.org/r/#/c/389584/ (blocked on an addition to siteinfo via MobileFrontend: https://gerrit.wikimedia.org/r/#/c/390248/)

@Mholloway Question… how may things are we using MobileFrontend for in MCS/PCS?

At this point just this, I think.

I should point out that this isn't a hard dependency; if the siteinfo property now added by MobileFrontend is missing, the mobile content URLs will simply be omitted from the response.

Change 391066 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Update namespace output to { id, title } object

https://gerrit.wikimedia.org/r/391066

Change 391066 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Update namespace output to { id, title } object

https://gerrit.wikimedia.org/r/391066

Change 392533 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Get mobile base URL from siteinfo and re-enable checks for mobile URLs

https://gerrit.wikimedia.org/r/392533

Change 392533 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Get mobile base URL from siteinfo and re-enable checks for mobile URLs

https://gerrit.wikimedia.org/r/392533

Change 397622 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Don't return 204 for main pages

https://gerrit.wikimedia.org/r/397622

Change 397622 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Summary: Send empty extracts, not 204, for main pages

https://gerrit.wikimedia.org/r/397622

Mholloway closed subtask Restricted Task as Resolved.Apr 9 2018, 5:39 PM

Change 425904 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Expose media, references, and metadata API URLs in the summary response

https://gerrit.wikimedia.org/r/425904

Change 425904 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Expose media, references, and metadata API URLs in the summary response

https://gerrit.wikimedia.org/r/425904