Page MenuHomePhabricator

Implement GET page history
Closed, ResolvedPublic5 Estimated Story Points

Description

Acceptance Criteria:

  • Define route for /page/{title}/history?{older_than|newer_than={id}}
  • Define route handler for page history
  • Request:
    • Must support HTTP GET only
    • Must support older_than or newer_than param, these parameters provide a revision id
      • Only one of older_than or newer_than are supported for a request
        • If both are provided the endpoint must return a meaningful error
    • Must support title param, this provides the title of the requested page
    • Request body must be empty
  • Response
    • Response must return JSON
    • Responses JSON must have structure:
{
    "older": "https://api.example/page/France/history?older_than=12345",
    "newer" "https://api.example/page/France/history?newer_than=67890"
    "latest": "https://api.full.url/page/some title/history",
    "revisions":[
      {
        "id": 889268681,
        "comment": "Rm unnecessary space in comment; all the others have 8 at the beginning and end, this had 9",
        "timestamp": "2019-03-24T16:37:55Z",
        "delta": -1,
        "size": 441,
        "user": {
         "name":  "The Blade of the Northern Lights",
         "id": 12345
         }
      }
    ]
}
  • older must contain full link to API endpoint for the next older segment of results (usually the same endpoint plus "older_than" with the last revision in this segment). May be excluded if there are no known older revisions.
  • newer must contain full link to API endpoint for the next newer segment of results (usually the same endpoint plus "newer_than" parameter for first revision in this segment). May be excluded if there are no known older revisions.
  • latest must contain full link to API endpoint for the latest values, usually just this endpoint with no parameters
  • older, newer, and latest must contain a fully qualified URL
  • Number of revisions returned must be limited to 20
  • Delta must contain +/- count of characters changed from previous revision

Event Timeline

WDoranWMF updated the task description. (Show Details)
WDoranWMF updated the task description. (Show Details)

This looks great.

"older" and "newer" should be links to history segments, not to revisions. so:

"older": "https://api.example/page/France/history?older_than=12345",
"newer" "https://api.example/page/France/history?newer_than=67890"

The goal here is to do the work for clients so they can scroll back through the segments without having to figure out how to do it themselves. Clients should be able to write code like this pseudocode:

url = latest
while url
  segment = fetch(url)
  process(segment.revisions)
  url = segment.older

Otherwise, this rocks.

Worth noting that the older_than and newer_than are parameters are revision IDs, not timestamps. The revision IDs are not monotonically increasing, so a revision may be older than another but have a higher revision ID.

WDoranWMF updated the task description. (Show Details)
WDoranWMF set the point value for this task to 5.

One thing about older: I am assuming that the query to get revisions would get one more than the page size, so that we know if there is at least one more revision not in the page.

Namespacing, and the details of how it will impact this endpoint, are being discussed in task T232485.

@eprodromou

I assume that revisions in the response should normally be ordered from newest to oldest (which facilitates looping through them via the paging pseudocode above, which starts at the newest revision and progresses through older revisions). But if newer_than is specified, they should be ordered oldest to newest (which facilitates the inverse direction of looping starting at an older revision and progressing through newer revisions. Am I correct in that assumption?

Must include a deleted flag array, identifying deleted fields.

You're referring to deleted fields represented by the "DELETED_" constants near the top of RevisionRecord.php, right?

https://github.com/wikimedia/mediawiki/blob/master/includes/Revision/RevisionRecord.php

Do you have an example of the desired output? For now, as a placeholder, I'm including something like this for any deleted fields

"flags":[
    "texthidden",
    "commenthidden",
    "userhidden",
    "restricted"
]

Finally, are you sure you want "older", "newer" and "latest" to include full links? I can do it (the code I'm about to upload includes it). But it is a bit more work for the server. And the client apparently already knows how to construct a functional API url, or else it wouldn't have managed to make an API call in the first place. Alternatively, I could just return "older" and "newer" rev_ids and let the client do the string manipulation.

Change 538133 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Core REST API handler for GET page history

https://gerrit.wikimedia.org/r/538133

@eprodromou

I assume that revisions in the response should normally be ordered from newest to oldest (which facilitates looping through them via the paging pseudocode above, which starts at the newest revision and progresses through older revisions). But if newer_than is specified, they should be ordered oldest to newest (which facilitates the inverse direction of looping starting at an older revision and progressing through newer revisions. Am I correct in that assumption?

No, they should still be ordered newest to oldest! The idea is that if you've got a bunch of revisions with IDs like:

75, 74, 73, 72, 71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33.

Then the first segment (with no parameters) would return revisions 75 through 56 in descending order in the revisions field; its older link would have older_than equal to 56, and it would not have a newer link (there are no newer items).

If you use the older link, with older_than=56, the segment would include revisions 55 through 36 in the revisions field, descending order; the older link would have an older_than=36 parameter, and the newer link would have newer_than=55.

Let's say, while you're scrolling around, a few people have edited the page (!), so now there's also revisions 81, 80, 79, 78, 77, 76. If the client uses the newer link with newer_than=55, it should return revisions 75 through 56 in descending order in the revisions field; its older link would have older_than=56, and its newer link would have newer_than=75 (since there are now newer items).

If you used the older link with older_than=36, you'd get revisions 35 through 33 descending, no older link (there are no older revisions), and newer link is newer_than=35.

Must include a deleted flag array, identifying deleted fields.

You're referring to deleted fields represented by the "DELETED_" constants near the top of RevisionRecord.php, right?

https://github.com/wikimedia/mediawiki/blob/master/includes/Revision/RevisionRecord.php

Do you have an example of the desired output? For now, as a placeholder, I'm including something like this for any deleted fields

Honestly, I'm not sure we actually want revisions with deleted parts at all. I thought Brad's query ignored them?

Finally, are you sure you want "older", "newer" and "latest" to include full links? I can do it (the code I'm about to upload includes it). But it is a bit more work for the server. And the client apparently already knows how to construct a functional API url, or else it wouldn't have managed to make an API call in the first place. Alternatively, I could just return "older" and "newer" rev_ids and let the client do the string manipulation.

Full links are great. Let's save the client a few steps, and make it easy to navigate through the segments without off-by-one errors!

Sorry, I might be late to the party, but I do have some terminology concerns:

  1. older and newer are for pagination. This seems to work for revisions (although it's a bit confusing whether we're talking about time or revisions), but for different API endpoints, where we would also want pagination (I could imagine a list of page titles) these words mean nothing. Action API has a unified pagination mechanism for all endpoints, and it seems to be quite a standard practice in REST APIs, so that the server and the client would be able to share the pagination code. What do you think about selecting some more universal words for that? Like next and previous as names for the links? I don't quite care what the actual parameter name is, having links for pagination clients will unlikely construct those manually.
  2. Having these links as root properties of the response is a bit strange. In HATEOAS all links are wrapped into _links property and follow a standard, facebook has paging property etc. In general, having them as root object properties seems to give too much attention to them.
  3. Are we standardizing on the names of the properties?
    • The edit comment is a called a comment in action API. It English with UI it is indeed called the summary, but in Russian UI it's called a 'comment' (in Russian). Summary could be confusing with article summary. Also, we need to think if we want to have parsed comment in the response as well. I would imagine that's what an app would wanna show to the user as it includes markup. If yes - why don't we make the comment and object with text and html property. And standardize that it would be a representation of the edit comment throughout the API.
  4. delta and size don't take MCR into account. Are these about main slot?
  5. flags is used in the action API in a very different meaning - they indicate minor revision for example. Also, why would we need to expose if the comment is suppressed for the revision - just not show it?

@Pchelolo

  1. I think the question came down to the insanely confusing nature of reverse-chronological segmented lists. "next" means "older" and "prev" means "newer". I decided that the advantage of having standard next/prev link relations was offset by having the reverse-chron confusion.
  1. This object is a list segment; I think that links to its neighbors and its contents are top-level properties.
  1. At this point, we'd need to have "comment" as a duplicate of "summary".
  1. Yes, main slot.
  1. I agree, we should not show properties that have been deleted. Ideally, we'd just hide the whole revision.
  1. I think the question came down to the insanely confusing nature of reverse-chronological segmented lists. "next" means "older" and "prev" means "newer". I decided that the advantage of having standard next/prev link relations was offset by having the reverse-chron confusion.

Next mean go further in the iterator I'm going on, if we standadise on how pagination works for all endpoints. Not more confusing then having to reimplement pagination logic for every endpoint..

  1. At this point, we'd need to have "comment" as a duplicate of "summary".

At which point? Nothing released can possibly be using this yet.

  1. Yes, main slot.

Can we expand the response format to be able to add more slots infos right away without creating a breaking change later on?

  1. Right, but "next" also means "chronologically later" in English. People screw it up all the time.
  1. I could live with it; I'd just want to make sure we changed it everywhere.
  1. Since 99.9999% of our revisions are to main slot, is it reasonable to have a representation that focuses primarily on the main slot, but also includes a field "slots" for additional slots? Like:
{
  "id": 12345,
  "delta": -1,
  "size": 441,
  "slots": {
    "mediainfo": {
      "delta": 0,
      "size": 1024
    }
  }
}
eprodromou added a subscriber: tstarling.

I changed "summary" to "comment", took out the caching headers per discussion with @tstarling , and made "user" an object with ID and name, also per a principle @tstarling recommended of bundling related properties.

One last thing: most of our Action API responses have a "response" property, and then the object. That's very RPC-ish, and not very REST-ish. I think @WDoranWMF copied these output values from Action API, and I'd prefer that we don't have "response" in the outputs. Just let the thing be the thing.

@eprodromou quick question on this one - based on the comments in https://phabricator.wikimedia.org/T231580#5522820 it sounds like the compare endpoint will throw an error if we pass in revision IDs that cross title changes / page moves. My question is will the history endpoint show revisions across page moves or does the history start over once the title changes? If we will see revisions from an older page title I was hoping we'd have a way to know that, so we can prevent the user from choosing revisions across this line and entering the diff screen knowing they will only see an error.

@eprodromou quick question on this one - based on the comments in https://phabricator.wikimedia.org/T231580#5522820 it sounds like the compare endpoint will throw an error if we pass in revision IDs that cross title changes / page moves.

My question is will the history endpoint show revisions across page moves or does the history start over once the title changes? If we will see revisions from an older page title I was hoping we'd have a way to know that, so we can prevent the user from choosing revisions across this line and entering the diff screen knowing they will only see an error.

I think all the revisions that are returned from the history endpoint should be history for the same page, and should be OK to send to the comparison endpoint without an error.

I think in MW core if you make 3 revisions to the page with title "A" (revision ID 10, 20, 30) and then move the page to title "B" and make 3 more revisions (revision ID 40, 50, and 60) then revision 50 and revision 20 are both considered revisions of the same page. All of the revisions 10, 20, 30, 40, 50, 60 should show up in the history of page "B", and you should be able to compare them.

So the history will look something like this: https://www.mediawiki.org/w/index.php?title=User:EProdromou_(WMF)/Bar&action=history

I think after the move getting the history of the page with title "A" will give a single revision (creating the redirect to "B")... unless there are other changes, obviously. So, like this: https://www.mediawiki.org/w/index.php?title=User:EProdromou_(WMF)/Foo&action=history

Does that sound like the right behaviour to you?

The question I was discussing is this: it's technically possible with the Action API and I think even our Web UI to compare two revisions of two totally different pages without any shared history (like comparing Barack Obama and France at https://en.wikipedia.org/w/index.php?title=France&type=revision&diff=920281468&oldid=917159623).

I think there's not any user interface to compare revisions across pages in the Web UI (I had to copy-and-paste revision IDs across URLs to get that Obama/France URL), and my guess is that you won't be building cross-page diffs in the iOS app... I hope? 🙏

If the API compares across different pages, there's some value in having more metadata about the page in the output, and Tim was worried about caching of that metadata. I don't think it's as important if it's only comparing revisions of the same page.

Let me know if that all makes sense.

Another thing: I take /page/{title}/history to mean that you're getting the history of the page that currently has that title, not the history of all pages that have had that title. I think that's in line with how the Action API and the Web UI work.

Change 538133 merged by jenkins-bot:
[mediawiki/core@master] Core REST API handler for GET page history

https://gerrit.wikimedia.org/r/538133

I reviewed this; the pagination works correctly and the output is correct. Nice job, thanks for the hard work.

DannyS712 subscribed.

[batch] remove patch for review tag from resolved tasks