Page MenuHomePhabricator

Develop a Summary JSON API
Closed, ResolvedPublic

Description

Note: this task is being implemented on the web team and the scope is totally defined in the subtask. This task serves as a marker for how it is integrated into the PCS as a whole.

At a high level, this is tracking the work being performed to create the next version of the RESTBase Summary API here: https://en.wikipedia.org/api/rest_v1/#!/Page_content/get_page_summary_title

The primary new features of the end point include:

  1. Normalization of title properties
  2. URLs for commonly needed content
  3. An improved extract of the page
  4. Handling of summaries for "not normal" pages (i.e. disambiguation pages)

Current example output:

{
  "type": "standard",
  "title": "Laredo, Texas",
  "displaytitle": "Laredo, Texas",
  "namespace": {
    "id": 0,
    "text": ""
  },
  "titles": {
    "canonical": "Laredo,_Texas",
    "normalized": "Laredo, Texas",
    "display": "Laredo, Texas",
  },
  "pageid": 136773,
  "thumbnail": {
    "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/9/91/Webb_County_Laredo.svg/320px-Webb_County_Laredo.svg.png",
    "width": 320,
    "height": 238
  },
  "originalimage": {
    "source": "https://upload.wikimedia.org/wikipedia/commons/9/91/Webb_County_Laredo.svg",
    "width": 982,
    "height": 730
  },
  "lang": "en",
  "dir": "ltr",
  "revision": "808369737",
  "tid": "808ed2b1-c464-11e7-84fe-e94e48c317f5",
  "timestamp": "2017-11-02T13:26:55Z",
  "description": "border city in Texas, USA",
  "coordinates": {
    "lat": 27.524445,
    "lon": -99.490593
  },
  "content_urls": {
    "desktop": {
      "page": "https://en.wikipedia.org/wiki/Laredo,_Texas",
      "revisions": "https://en.wikipedia.org/wiki/Laredo,_Texas?action=history",
      "edit": "https://en.wikipedia.org/wiki/Laredo,_Texas?action=edit",
      "talk": "https://en.wikipedia.org/wiki/Talk:Laredo,_Texas"
    },
    "mobile": {
      "page": "https://en.m.wikipedia.org/wiki/Laredo,_Texas",
      "revisions": "https://en.m.wikipedia.org/wiki/Special:History/Laredo,_Texas",
      "edit": "https://en.m.wikipedia.org/wiki/Laredo,_Texas?action=edit",
      "talk": "https://en.m.wikipedia.org/wiki/Talk:Laredo,_Texas"
    }
  },
  "api_urls": {
    "summary": "https://en.wikipedia.org/api/rest_v1/page/summary/Laredo,_Texas",
    "mobile_sections": "https://en.wikipedia.org/api/rest_v1/page/mobile-sections/Laredo,_Texas",
    "edit_html": "https://en.wikipedia.org/api/rest_v1/page/html/Laredo,_Texas",
    "talk_page_html": "https://en.wikipedia.org/api/rest_v1/page/html/Talk:Laredo,_Texas"
  },
  "extract": "Laredo is the county seat of Webb County, Texas, United States, located on the north bank of the Rio Grande in South Texas, across from Nuevo Laredo, Tamaulipas, Mexico. According to the 2010 census, the city population was 236,091, making it the tenth-most populous city in the state of Texas and third-most populated on the Mexico–United States border, after San Diego, California, and El Paso, Texas. Its metropolitan area is the 178th-largest in the \nU.S. and includes all of Webb County, with a population of 250,304. Laredo is also part of the cross-border Laredo-Nuevo Laredo Metropolitan Area with an estimated population of 636,516.",
  "extract_html": "<p><b>Laredo</b> is the <span>county seat</span> of <span>Webb County, Texas</span>, United States, located on the north bank of the <span>Rio Grande</span> in <span>South Texas</span>, across from <span>Nuevo Laredo</span>, <span>Tamaulipas</span>, <span>Mexico</span>. According to the <span>2010 census</span>, the city population was 236,091, making it the <span>tenth-most populous</span> city in the <span>state</span> of <span>Texas</span> and third-most populated on the <span>Mexico–United States border</span>, after <span class=\"mw-redirect\">San Diego, California</span>, and <span>El Paso, Texas</span>. Its <span>metropolitan area</span> is the <span class=\"mw-redirect\">178th-largest in the \nU.S.</span> and includes all of Webb County, with a population of 250,304. Laredo is also part of the cross-border <span class=\"mw-redirect\">Laredo-Nuevo Laredo Metropolitan Area</span> with an estimated population of 636,516.</p>"
}

Details

Related Gerrit Patches:
mediawiki/services/mobileapps : masterExpose media, references, and metadata API URLs in the summary response
mediawiki/services/mobileapps : masterSummary: Send empty extracts, not 204, for main pages
mediawiki/services/mobileapps : masterGet mobile base URL from siteinfo and re-enable checks for mobile URLs
mediawiki/services/mobileapps : masterUpdate namespace output to { id, title } object
mediawiki/services/mobileapps : masterRemove mobile-sections URL from summary API URLs
mediawiki/services/mobileapps : masterExpose tid in the summary 2.0 output
mediawiki/services/mobileapps : masterUse the underlying page HTML tid in page content endpoint etags
mediawiki/services/mobileapps : masterClean up titles object keys
mediawiki/services/mobileapps : masterMove namespace_id and namespace_text to top level
mediawiki/services/mobileapps : masterSuppress not-yet-deployed REST API URLs

Related Objects

StatusAssignedTask
OpenNone
OpenNone
ResolvedJhernandez
ResolvedMholloway
Resolvedovasileva
Resolvedphuedx
Resolvedphuedx
DuplicateNone
ResolvedJdlrobson
ResolvedJdlrobson
DuplicateNone
Duplicateovasileva
Resolvedovasileva
DuplicateNone
DeclinedNone
DuplicateJdlrobson
ResolvedMhurd
DeclinedJMinor
Resolvedphuedx
ResolvedPchelolo
ResolvedJdlrobson
DeclinedPchelolo
Resolvedphuedx
DeclinedJdlrobson
DuplicateNone
ResolvedFjalapeno
Resolvedphuedx
Declinedpmiazga
DeclinedNone
Resolvedphuedx
DeclinedNone
ResolvedPchelolo
ResolvedbearND
ResolvedMholloway
ResolvedMSantos
ResolvedMholloway
InvalidNone
ResolvedJdlrobson
InvalidNone
DuplicateNone
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedJdlrobson
Resolvedphuedx
ResolvedbearND
ResolvedMholloway
DuplicateNone
ResolvedJdlrobson
ResolvedJdlrobson
Resolvedphuedx
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedbearND
ResolvedJdlrobson
ResolvedMholloway
ResolvedMholloway
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedbearND
Resolvedmobrovac
ResolvedABorbaWMF
ResolvedABorbaWMF
ResolvedMholloway
ResolvedMholloway
ResolvedMholloway
ResolvedMholloway
DeclinedNone
ResolvedMholloway
ResolvedMholloway
DeclinedNone
ResolvedPchelolo
ResolvedMholloway

Event Timeline

Fjalapeno updated the task description. (Show Details)Oct 4 2017, 6:08 PM
Fjalapeno updated the task description. (Show Details)
This comment was removed by Mholloway.
Mholloway updated the task description. (Show Details)Nov 6 2017, 9:12 PM
phuedx added a subscriber: phuedx.Nov 7 2017, 4:18 PM
bearND added a subscriber: bearND.Nov 9 2017, 3:20 AM

@Fjalapeno Is this still blocked? If so on what?

So the things I see that we need to do (if the response is up to date)

  1. Some URLs need to be hidden until they are deployed:
"read_html": "https://en.wikipedia.org/api/rest_v1/page/read-html/Laredo,_Texas",
"content_html": "https://en.wikipedia.org/api/rest_v1/page/content-html/Laredo,_Texas",
"metadata": "https://en.wikipedia.org/api/rest_v1/page/metadata/Laredo,_Texas",
"references": "https://en.wikipedia.org/api/rest_v1/page/references/Laredo,_Texas",
"media": "https://en.wikipedia.org/api/rest_v1/page/media/Laredo,_Texas",
  1. We need the mobile and desktop URLs in the content_URLs
  1. We need to decide what to do with the namespace properties (inside or outside the titles dictionary?)
  1. We need the time UUID in the response
  1. Bikeshed: should we just remove the trailing "name" from "namespace_name"? It seems redundant.

I think that is mostly it… there is one other topic I'll bring up in a separate comment around the "type" property… just want to get some web team input there. But I'm not sure it is a blocker since it is additive…

@Mholloway can you confirm the comment I just made? Does that seem right as a list of things to do?

@phuedx Hey looking at the spec for type… we have the following:

"disambiguation", "wikidata", or "standard"

I'm thinking we should expand that a bit further after a few recent talks about how we handle other types of pages. Specifically things like "File" pages…

"talk,", "file", "category", "disambiguation", "wikidata", or "standard"

This will let clients in the future decide to handle these pages differently… OR are we just recreating the name space property here?

I guess something bothers me about calling a "category" page "standard".

Thoughts?

@Mholloway @bearND One more bikeshed…

So Talk pages will have their own summary… what does that look like?

I'm assuming it will have the URLS for the HTML and the APIs, so I am wondering if we need to include those URLs here as well. Maybe we just need a link to the Talk Page summary so they can get all the information they need about the talk page if they want it?

Thoughts?

phuedx added a comment.EditedNov 9 2017, 10:32 AM

@phuedx Hey looking at the spec for type… we have the following:

"disambiguation", "wikidata", or "standard"

I'm thinking we should expand that a bit further after a few recent talks about how we handle other types of pages. Specifically things like "File" pages…

"talk,", "file", "category", "disambiguation", "wikidata", or "standard"

This will let clients in the future decide to handle these pages differently… OR are we just recreating the [namespace] property here?

I think this goes to the principle behind adding content and API URLs to PCS responses: the server is providing important meta information about the page and/or project and the client is free to do with it what it will. In this case, the server is signalling to the client that it's treated the page differently while generating the preview.

I'd recommend adding more types as and when we need to (i.e. when Design and/or Product call for it).

Change 390252 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Suppress not-yet-deployed REST API URLs

https://gerrit.wikimedia.org/r/390252

So the things I see that we need to do (if the response is up to date)

  1. Some URLs need to be hidden until they are deployed:
"read_html": "https://en.wikipedia.org/api/rest_v1/page/read-html/Laredo,_Texas",
"content_html": "https://en.wikipedia.org/api/rest_v1/page/content-html/Laredo,_Texas",
"metadata": "https://en.wikipedia.org/api/rest_v1/page/metadata/Laredo,_Texas",
"references": "https://en.wikipedia.org/api/rest_v1/page/references/Laredo,_Texas",
"media": "https://en.wikipedia.org/api/rest_v1/page/media/Laredo,_Texas",

https://gerrit.wikimedia.org/r/390252

  1. We need the mobile and desktop URLs in the content_URLs

https://gerrit.wikimedia.org/r/#/c/389584/ (blocked on an addition to siteinfo via MobileFrontend: https://gerrit.wikimedia.org/r/#/c/390248/)

  1. We need to decide what to do with the namespace properties (inside or outside the titles dictionary?)

I vote outside.

We also need to decide once and for all on what to call them (I don't know that we ever got to that yesterday). The already-deployed mobile-sections lead response uses "ns" for numeric namespace. Do we want to follow that for consistency, or use the longer names above in the summary endpoint for clarity, at the price of inconsistency?

  1. We need the time UUID in the response

Will do this a.m.

  1. Bikeshed: should we just remove the trailing "name" from "namespace_name"? It seems redundant.

Yeah, I'm not a big fan of "namespace_name," and yet just "namespace" is ambiguous. Maybe "namespace_text" is more meaningful?

I think that is mostly it… there is one other topic I'll bring up in a separate comment around the "type" property… just want to get some web team input there. But I'm not sure it is a blocker since it is additive…

@Mholloway can you confirm the comment I just made? Does that seem right as a list of things to do?

Yep, I think that covers it.

@Mholloway @bearND One more bikeshed…
So Talk pages will have their own summary… what does that look like?

Per the spec, the summary endpoint is only serving content for titles in "content namespaces," implemented as a whitelist currently containing only the main namespace. For Talk and all other namespaces we're responding with 204s. The last decision on the matter was to postpone work on other namespaces until someone has an actual use case.

Discussion was here, for posterity: T178420: What is a "content namespace" for purposes of the summary 2.0 endpoint?

By the way, did we decide for sure on using .../wiki/Title?action=edit#/editor/0 as the mobile edit URL, even though it refers specifically to the lead section?

Change 390252 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Suppress not-yet-deployed REST API URLs

https://gerrit.wikimedia.org/r/390252

Mholloway updated the task description. (Show Details)Nov 9 2017, 3:38 PM
Mholloway added a comment.EditedNov 9 2017, 3:48 PM

@Mholloway can you confirm the comment I just made? Does that seem right as a list of things to do?

Should separate mobile-sections-lead and mobile-sections-remaining API URLs be added? On the one hand, it would make the list more complete. On the other, I am not sure that clients other than the Android app use those endpoints, and the Android app won't be making use of these fully formed URLs because of how its API interfaces are implemented.

bearND added a comment.Nov 9 2017, 4:46 PM

I'm leaning towards removing the reference to mobile-sections since we will consider this a deprecated API once PCS is functional. No need to proliferate usage at this point IMO.

Yeah, the section number in the edit URI is a tricky subject. I'm not aware of any agreement on this or a really satisfying solution for this.
Assuming we keep the URI as is since mobile clients prefer to edit only single sections, I see two main options but they have their drawbacks:

  1. Leave the URI as is, pointing to section 0 initially, and let the client substitute the '0' with whatever section number the client wants to edit.
  2. Use URI template syntax, like {section_id}. There's an RFC for URI templates.

The reason why I don't really like either of them is that in both cases the client needs to be aware of the string substitution. Having said that, I guess #2 is acceptable since we're not really doing HATEOAS for the URIs anyways and the client needs to know which URI key from the dictionary it wants already.

Change 390282 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Use the underlying page HTML tid in page content endpoint etags

https://gerrit.wikimedia.org/r/390282

Change 390283 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Expose tid in the summary 2.0 output

https://gerrit.wikimedia.org/r/390283

Mholloway updated the task description. (Show Details)Nov 9 2017, 6:10 PM
  1. We need the mobile and desktop URLs in the content_URLs

https://gerrit.wikimedia.org/r/#/c/389584/ (blocked on an addition to siteinfo via MobileFrontend: https://gerrit.wikimedia.org/r/#/c/390248/)

@Mholloway Question… how may things are we using MobileFrontend for in MCS/PCS?

  1. We need to decide what to do with the namespace properties (inside or outside the titles dictionary?)

I vote outside.

  1. Bikeshed: should we just remove the trailing "name" from "namespace_name"? It seems redundant.

Yeah, I'm not a big fan of "namespace_name," and yet just "namespace" is ambiguous. Maybe "namespace_text" is more meaningful?

Should this be a hash? with 2 values? like namespace {text:"", id:0 }?

We also need to decide once and for all on what to call them (I don't know that we ever got to that yesterday). The already-deployed mobile-sections lead response uses "ns" for numeric namespace. Do we want to follow that for consistency, or use the longer names above in the summary endpoint for clarity, at the price of inconsistency?

clarity…
Also remember mobile sections is being deprecated and going away… so we should keep consistency between the new endpoints, but don't worry about ones that are being replaced.

Should separate mobile-sections-lead and mobile-sections-remaining API URLs be added? On the one hand, it would make the list more complete. On the other, I am not sure that clients other than the Android app use those endpoints, and the Android app won't be making use of these fully formed URLs because of how its API interfaces are implemented.

Lets just add the main one for now… we can look into adding the 2 others later.

We do though want to get the clients to stop making URLs… that's the point. If we add these URLs and the clients are ignoring them then that isn't much use. But maybe we can wait until the PCS is complete for those changes on the clients though

By the way, did we decide for sure on using .../wiki/Title?action=edit#/editor/0 as the mobile edit URL, even though it refers specifically to the lead section?

Wait… do we need to pass the "0"?

I'm leaning towards removing the reference to mobile-sections since we will consider this a deprecated API once PCS is functional. No need to proliferate usage at this point IMO.

Actually this may be the way to go… I'm good with not adding it if you 2 are good

bearND added a comment.Nov 9 2017, 6:49 PM

Wait… do we need to pass the "0"?

Well, JavaScript-enabled web mobile clients need to pass in the section number, or the section edit feature would not be used. Instead it would fall back to the old way and try to edit the whole page.

Change 390295 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Remove mobile-sections URL from summary API URLs

https://gerrit.wikimedia.org/r/390295

  1. Bikeshed: should we just remove the trailing "name" from "namespace_name"? It seems redundant.

Yeah, I'm not a big fan of "namespace_name," and yet just "namespace" is ambiguous. Maybe "namespace_text" is more meaningful?

Should this be a hash? with 2 values? like namespace {text:"", id:0 }?

I personally like a flatter structure (and think that leaves less opportunity for client parsing errors), but don't feel strongly about it.

We also need to decide once and for all on what to call them (I don't know that we ever got to that yesterday). The already-deployed mobile-sections lead response uses "ns" for numeric namespace. Do we want to follow that for consistency, or use the longer names above in the summary endpoint for clarity, at the price of inconsistency?

clarity…

I imagine we should also change these names in the formatted and formatted-lead (next-gen mobile-sections) endpoints, too, then?

phuedx added a comment.EditedNov 9 2017, 7:40 PM

@Fjalapeno, @bearND: Re T177431#3748630 and T177431#3748749: It might be worth defaulting to the "base" editing experience (no hash fragment in the URL) for now while we think about how we should approach linking to the mobile editing experience. Right now, my suggestion of the base URL plus the hash fragment seems more of a fortunate hack rather than an actual solution.

Change 390298 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Move namespace_id and namespace_text to top level

https://gerrit.wikimedia.org/r/390298

Mholloway updated the task description. (Show Details)Nov 9 2017, 7:45 PM

I imagine we should also change these names in the formatted and formatted-lead (next-gen mobile-sections) endpoints, too, then?

Those end points are superseded by the PCS endpoints…
The work from them has already largely been incorporated into the Read HTML endpoints (@bearND I think you have most of those improvements already right?)

We may need to do something here with a lead section again (Potentially Marvin needs this if they want to support no downloading the entire article). But we are waiting for the product requirements to be finalized here before building an endpoint.

@Fjalapeno, @bearND: Re T177431#3748630 and T177431#3748749: It might be worth defaulting to the "base" editing experience (no hash fragment in the URL) for now while we think about how we should approach linking to the mobile editing experience. Right now, my suggestion of the base URL plus the hash fragment seems more of a fortunate hack rather than an actual solution.

This seems pragmatic to me as well

@Mholloway something I just noticed… sorry for all the last minute things…

"titles": {
  "title": "Laredo,_Texas",
  "normalized_title": "Laredo, Texas",
  "display_title": "Laredo, Texas",
},

Lets get rid of duplication of "title":

"titles": {
  "canonical": "Laredo,_Texas",
  "normalized": "Laredo, Texas",
  "display": "Laredo, Texas",
},
Mholloway updated the task description. (Show Details)Nov 9 2017, 8:47 PM

Change 390309 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Clean up titles object keys

https://gerrit.wikimedia.org/r/390309

Mholloway updated the task description. (Show Details)Nov 9 2017, 8:56 PM

Change 390298 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Move namespace_id and namespace_text to top level

https://gerrit.wikimedia.org/r/390298

Change 390309 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Clean up titles object keys

https://gerrit.wikimedia.org/r/390309

Change 390282 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Use the underlying page HTML tid in page content endpoint etags

https://gerrit.wikimedia.org/r/390282

Change 390283 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Expose tid in the summary 2.0 output

https://gerrit.wikimedia.org/r/390283

Change 390295 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Remove mobile-sections URL from summary API URLs

https://gerrit.wikimedia.org/r/390295

  1. We need the mobile and desktop URLs in the content_URLs

https://gerrit.wikimedia.org/r/#/c/389584/ (blocked on an addition to siteinfo via MobileFrontend: https://gerrit.wikimedia.org/r/#/c/390248/)

@Mholloway Question… how may things are we using MobileFrontend for in MCS/PCS?

At this point just this, I think.

I should point out that this isn't a hard dependency; if the siteinfo property now added by MobileFrontend is missing, the mobile content URLs will simply be omitted from the response.

Change 391066 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Update namespace output to { id, title } object

https://gerrit.wikimedia.org/r/391066

Mholloway updated the task description. (Show Details)Nov 13 2017, 7:27 PM

Change 391066 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Update namespace output to { id, title } object

https://gerrit.wikimedia.org/r/391066

Change 392533 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Get mobile base URL from siteinfo and re-enable checks for mobile URLs

https://gerrit.wikimedia.org/r/392533

Change 392533 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Get mobile base URL from siteinfo and re-enable checks for mobile URLs

https://gerrit.wikimedia.org/r/392533

Change 397622 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Don't return 204 for main pages

https://gerrit.wikimedia.org/r/397622

Change 397622 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Summary: Send empty extracts, not 204, for main pages

https://gerrit.wikimedia.org/r/397622

Mholloway added a subtask: Restricted Task.Jan 10 2018, 3:51 PM
Mholloway closed this task as Resolved.Feb 21 2018, 8:03 PM
Mholloway closed subtask Restricted Task as Resolved.Apr 9 2018, 5:39 PM

Change 425904 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Expose media, references, and metadata API URLs in the summary response

https://gerrit.wikimedia.org/r/425904

Change 425904 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Expose media, references, and metadata API URLs in the summary response

https://gerrit.wikimedia.org/r/425904