Page MenuHomePhabricator

API: Multiple page histories
Closed, DeclinedPublic

Description

Author: matthew.britton

Description:
I have a particular use case for the API that could be streamlined a little.

It would be useful to have basic info for the last few revisions to a page (say, 10 or so), for RC patrolling purposes, to help avoid the problem of people reverting to the wrong revision when trying to deal with vandalism.

This is easy to do with prop=revisions. At the moment, though, it can only be done one page at a time, unless I know the revision IDs beforehand, which I don't, and a separate page history request on top of the diff request for every change being reviewed is a little excessive, so I don't do it. (API diffs wouldn't help here as the diff query would work in such a way as to make combining it with a page history query impossible).

What I would like to be able to do is use something like the following query string:

action=query&titles=Foo|Bar|Baz&prop=revisions&rvlimit=10

and have it return the last 10 revisions for each of the pages Foo, Bar and Baz, i.e. 30 revisions in total. At the moment, of course, this doesn't work because rvlimit can only be used with a single page. Obviously the limit-checking code would have to be rewritten to check (rvlimit * title count) <= 500 rather than rvlimit <= 500. As far as I can tell, asking for X revisions this way should be no more of a performance hit than asking for X arbitrary revisions, which is already possible. I'd probably give it about five titles per query, so this would cut the number of queries needed to get the data by four-fifths.


Version: unspecified
Severity: enhancement

Details

Reference
bz17033

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:25 PM
bzimport set Reference to bz17033.

(In reply to comment #0)

I have a particular use case for the API that could be streamlined a little.

It would be useful to have basic info for the last few revisions to a page
(say, 10 or so), for RC patrolling purposes, to help avoid the problem of
people reverting to the wrong revision when trying to deal with vandalism.

This is easy to do with prop=revisions. At the moment, though, it can only be
done one page at a time, unless I know the revision IDs beforehand, which I
don't, and a separate page history request on top of the diff request for every
change being reviewed is a little excessive, so I don't do it. (API diffs
wouldn't help here as the diff query would work in such a way as to make
combining it with a page history query impossible).

What I would like to be able to do is use something like the following query
string:

action=query&titles=Foo|Bar|Baz&prop=revisions&rvlimit=10

and have it return the last 10 revisions for each of the pages Foo, Bar and
Baz, i.e. 30 revisions in total. At the moment, of course, this doesn't work
because rvlimit can only be used with a single page.

There's a good reason for that: there's no efficient way to get the first 10 revisions of Foo and the first 10 revisions of Bar in one database query. The closest approximation would be a query asking for the last 20 revisions of either Foo or Bar, but that could very well give you a 15/5 or 0/20 split, depending on the circumstances. The only way to ensure a 10/10 split is to run a separate query for each title, which is exactly what I don't want to do for performance reasons (as a general rule, database queries in loops are evil, especially if the number of iterations is controlled by the user).

Obviously the
limit-checking code would have to be rewritten to check (rvlimit * title count)
<= 500 rather than rvlimit <= 500. As far as I can tell, asking for X revisions
this way should be no more of a performance hit than asking for X arbitrary
revisions, which is already possible.

Actually, it is, because asking for a number of arbitrary revisions based on revids can be done in one query.

I'd probably give it about five titles
per query, so this would cut the number of queries needed to get the data by
four-fifths.

Like I said above, it wouldn't.

matthew.britton wrote:

(In reply to comment #1)

There's a good reason for that: there's no efficient way to get the first 10
revisions of Foo and the first 10 revisions of Bar in one database query.

Yeah, I see what you're getting at; never mind. (Perhaps I should stop trying to rewrite MediaWiki here... :/)