Page MenuHomePhabricator

Feature request: Extend access to the API oldreviewedpages interface
Closed, ResolvedPublic

Description

Author: guandalug

Description:
I'd like to have access to the "flagged revisions" data via API, not only via Special: - pages. This is especially interesting for Special:OldReviewedPages and the review - status / changes of the pages therein (e.g. to gather those files only 'unreviewed' due to a template change).


Version: unspecified
Severity: enhancement

Details

Reference
bz14345

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:11 PM
bzimport set Reference to bz14345.

Pages that are unreviewed due to template changes are not listed there, though that may be desirable.

guandalug wrote:

I had pages like those, where the only 'change' was a modified template. Of course, now that I SEARCH for such a page, there's none....

Changing component to FlaggedRevs. Since it's an extension, it's up to its authors to write an API module for it.

I'll wait till there is some sort of specific use for this.

Patch to add three basic API modules for FlaggedRevs

I believe this would be a big help to the bot owners and developers at those projects, that use the FlaggedRevs-extension. So I started and wrote three basic modules for querying the most important information about flagging status of pages:

  1. ApiQueryOldreviewedpages / 'list=oldreviewedpages' : Similar to [[Special:OldReviewedPages]], list out the outdated pages, sorted by timestamp
  1. ApiQueryFlagged / 'prop=flagged' : Get information about the flagging status of a whole page, such as latest stable revid, if there's a quality revision etc.
  1. ApiQueryRevisionsFR / 'prop=revisions&rvprop=flagged' : extends the 'prop=revisions' module to show the flagging information about specific revisions.

I have tested this patch on my local install and had no problems with it. So I would be glad if someone would be willing to review/comment on it.

A simple testcase, that uses all 3 modules is the query:

api.php?action=query&generator=oldreviewedpages&gorlimit=1&prop=revisions|info|flagged&rvprop=ids|user|timestamp|flagged&rvlimit=50

attachment flaggedrevsapi.patch ignored as obsolete

correct version of the patch

attachment flaggedrevsapi.patch ignored as obsolete

Bryan.TongMinh wrote:

FlaggedRevs has some sane backends I believe which should be used rather than querying the database directly.

(In reply to comment #7)

FlaggedRevs has some sane backends I believe which should be used rather than
querying the database directly.

I'm not sure, but I guess that would result in one database query per page/revision. Wouldn't that be slower compared to querying all pages/revisions in one batch?

Overriding prop=revisions seems to duplicate a lot of code and threaten core changes there.

(In reply to comment #9)

Overriding prop=revisions seems to duplicate a lot of code and threaten core
changes there.

Well, ApiQueryRevisionsFR uses ApiQueryRevisions as base class, so it does not duplicate the code and should be rather safe with respect to core changes. But you're probably right that it's not a perfect solution. Would it be an option to introduce some hooks into ApiQueryRevisions?

(In reply to comment #10)

(In reply to comment #9)

Overriding prop=revisions seems to duplicate a lot of code and threaten core
changes there.

Well, ApiQueryRevisionsFR uses ApiQueryRevisions as base class, so it does not
duplicate the code and should be rather safe with respect to core changes. But
you're probably right that it's not a perfect solution. Would it be an option
to introduce some hooks into ApiQueryRevisions?

There is a hook in MW 1.14 which allows for adding parameters cleanly. The patch doesn't duplicate code, because it overrides execute() and calls parent::execute() first. Figuring out what the queried titles were from the ApiResult is ugly, there's ApiPageSet for that (which you did use in ApiQueryFlagged).

Come to think of it, we should probably have hacks in each execute() function instead of letting extensions override core modules just to add some functionality: the way things currently work makes it impossible for two extensions to extend the same core module.

As to the rest of the patch:

  • Adding fp_pending_since IS NOT NULL is kind of redundant if one of $params['start'] and $params['end'] is set, because in that case addWhereRange() will already have added fp_pending_since < 'foo' (or >) to the WHERE clause
  • Why on earth does the flaggedpages have (fp_page_id, fp_rev_id) as a primary key while fp_rev_id is already unique?
  • You shouldn't hardcode AND and OR the way you did, Database::makeList() can build all this for you
  • The default value for ordir should be older, not newer, for consistency with other modules (and because it's the most common use case)
  • ApiQueryOldreviewpages::getAllowedParams() correctly accounts for !$wgFlaggedRevsNamespaces when setting the default value, but not when setting the array of allowed types (if !$wgFlaggedRevsNamespaces, that array should be array(0)). Also, strval() shouldn't be used; namespaces in API parameters should always be numeric
  • Instead of using just 0, 1 and 2 for flagged_level and explaining them in the help text, add a flagged_level_text field
  • In ApiQueryFlagged, don't call addValue() three times; instead, throw your stuff in an array and addValue() it in the end, like you did in ApiQueryOldreviewedpages
  • If you've got an array containing pageIDs, don't call it $titles , that's kind of confusing

I'll improve the patch when I have time.

I don't like how ApiQueryRevisionsFR does it's iteration to find what rev IDs to fetch and then does a second query on that. I'd prefer the queries be combined and there be only one such iteration.

A note about 'flaggedpages' is that it's PRIMARY (fp_page_id, fp_rev_id) serves the purpose of both fetching single revision data, helping selection data for pages, and making page lists, all without any secondary look-ups. Having just a rev_id INDEX would still require a (page,rev) index for some queries, which would take up more space, overhead to maintain, and have secondary look-ups.

'flaggedrevs' I mean, not 'flaggedpages'

(In reply to comment #11)
Thanks for your comments, I try to make an new patch with the issues fixed, soon.

A few remarks:

Figuring out what the queried titles were from the
ApiResult is ugly, there's ApiPageSet for that (which you did use in
ApiQueryFlagged).

ApiPageSet won't work here, b/c it doesn't contain all the revids of the result, if a range of revisions was queried (e.g. with startid/endid).

  • The default value for ordir should be older, not newer, for consistency with

other modules (and because it's the most common use case)

This module is some kind of "the other way round", as usually users want to look at the oldest unreviewed changes. That's why I changed the default value here.

Also, strval() shouldn't be used; namespaces in API parameters
should always be numeric

I have a problem here, cause when I don't use strval, it doesn't recognize the namespace 0. (Probably since 0==null). Is there a way to avoid this?

(In reply to comment #14)

(In reply to comment #11)
Thanks for your comments, I try to make an new patch with the issues fixed,
soon.

A few remarks:

Figuring out what the queried titles were from the
ApiResult is ugly, there's ApiPageSet for that (which you did use in
ApiQueryFlagged).

ApiPageSet won't work here, b/c it doesn't contain all the revids of the
result, if a range of revisions was queried (e.g. with startid/endid).

Right, forgot about that.

  • The default value for ordir should be older, not newer, for consistency with

other modules (and because it's the most common use case)

This module is some kind of "the other way round", as usually users want to
look at the oldest unreviewed changes. That's why I changed the default value
here.

Well if it makes sense in the context of FlaggedRevs, and Aaron agrees with it, so be it.

Also, strval() shouldn't be used; namespaces in API parameters
should always be numeric

I have a problem here, cause when I don't use strval, it doesn't recognize the
namespace 0. (Probably since 0==null). Is there a way to avoid this?

You could avoid this by setting PARAM_TYPE => 'namespace'. That would also avoid a potential bug that would hide stuff when namespaces are removed from $wgFlaggedRevsNamespaces.

(In reply to comment #12)

I don't like how ApiQueryRevisionsFR does it's iteration to find what rev IDs
to fetch and then does a second query on that. I'd prefer the queries be
combined and there be only one such iteration.

I don't see two queries. He's iterating over the result of prop=revisions first, then running one query. Combining the queries would be slightly more ideal, but would result in hooks all over the place and very ugly code.

A note about 'flaggedpages' is that it's PRIMARY (fp_page_id, fp_rev_id) serves
the purpose of both fetching single revision data, helping selection data for
pages, and making page lists, all without any secondary look-ups. Having just a
rev_id INDEX would still require a (page,rev) index for some queries, which
would take up more space, overhead to maintain, and have secondary look-ups.

Is there an index on just (fp_rev_id) as well then? That would be useful in this case.

(In reply to comment #11)

Come to think of it, we should probably have hacks

I meant hooks, of course.

in each execute() function
instead of letting extensions override core modules just to add some
functionality: the way things currently work makes it impossible for two
extensions to extend the same core module.

Added them in r40965.

New patch now uses api hooks

I tried to fix most of the issues you mentioned above. Instead of the overriding revision module, now the new hooks are used to add the revisions' flagging data.

It seems to work this way on my install, though there's perhaps one thing that could be fixed:
As the DB functions in ApiBase/ApiQueryBase are protected, I can't use them in the hook function. So I had to make another call to wfGetDB(). Don't know, if that makes any difference?

attachment flaggedrevsapi.patch ignored as obsolete

wfGetDB() calls in these cases tend to be very minor of an issue

(In reply to comment #17)

Created an attachment (id=5343) [details]
New patch now uses api hooks

I tried to fix most of the issues you mentioned above. Instead of the
overriding revision module, now the new hooks are used to add the revisions'
flagging data.

It seems to work this way on my install, though there's perhaps one thing that
could be fixed:
As the DB functions in ApiBase/ApiQueryBase are protected, I can't use them in
the hook function. So I had to make another call to wfGetDB(). Don't know, if
that makes any difference?

Like Aaron said, wfGetDB() isn't the end of the world. You can get around this and other stuff by putting your hook function in a class that extends ApiQueryBase. If you do that, you can also use addTables() and friends.

  • Isn't if ($pageid !== "_element" && isset($page['revisions'])) kind of redundant? _element elements are never arrays, so isset($page['revisions']) should be enough. Same for isset($rev['revid'])
  • You don't need $singlePageid, array_values($revids) works just as well as array_keys() (which you do use)
  • In fact, the special case of having one revid shouldn't be a special case at all: IIRC, makeList() knows how to handle one-element arrays correctly
  • You probably want to improve readability a little bit by moving up the fr_user=user_id to right after the $fields assignment (although it's not that big a deal) and by commenting on how the $revids array works (it's array(revid => array(pageid, index)), and it doesn't become clear what the index is needed for until about 50 lines later)

Patch with hook functions in own class / code tweaks

Like Aaron said, wfGetDB() isn't the end of the world. You can get around this
and other stuff by putting your hook function in a class that extends
ApiQueryBase. If you do that, you can also use addTables() and friends.

Done. Thanks for the hint :)

  • Isn't if ($pageid !== "_element" && isset($page['revisions'])) kind of

redundant? _element elements are never arrays, so isset($page['revisions'])
should be enough. Same for isset($rev['revid'])

Er, if $page is a string, PHP converts $page['revisions'] to $page[0], so isset(...) is always true. I tried to change the line to ´´if (array_key_exists('revisions',(array)$page))'', which seems to work, now.

  • You don't need $singlePageid, array_values($revids) works just as well as

array_keys() (which you do use)

  • In fact, the special case of having one revid shouldn't be a special case at

all: IIRC, makeList() knows how to handle one-element arrays correctly

  • You probably want to improve readability a little bit by moving up the

fr_user=user_id to right after the $fields assignment (although it's not that
big a deal) and by commenting on how the $revids array works (it's array(revid

> array(pageid, index)), and it doesn't become clear what the index is needed

for until about 50 lines later)

Right, I simplified the loop thing a bit, so it should be easier to read, now.

attachment flaggedrevsapi.patch ignored as obsolete

(In reply to comment #20)

Created an attachment (id=5345) [details]
Patch with hook functions in own class / code tweaks

Like Aaron said, wfGetDB() isn't the end of the world. You can get around this
and other stuff by putting your hook function in a class that extends
ApiQueryBase. If you do that, you can also use addTables() and friends.

Done. Thanks for the hint :)

Oh wait, actually subclassing ApiQueryBase doesn't help at all here. I don't know what I was smoking when I wrote that. addTables() and friends seem to be public.

The patch as it is looks good, aside from two minor things:

  • Making FlaggedRevsApiHooks extend ApiQueryBase turns out not to do any good after all (my mistake)
  • Using $wgFlaggedRevsNamespaces for ornamespace's PARAM_TYPE doesn't seem like a good idea to me; I think you'd be better off using PARAM_TYPE => 'namespace', because that'll let users query namespaces that used to have FlaggedRevs enabled but don't anymore

If these two issues are fixed, the patch is good enough IMO. Of course Aaron has to greenlight it as well.

added flagged_level_text output field, changed namespace param

  • Making FlaggedRevsApiHooks extend ApiQueryBase turns out not to do any good

after all (my mistake)

I think your first thought here was the right one: as it seems I can only access ApiQueryBase' protected functions, if I call them from within a subclass of it.

  • Using $wgFlaggedRevsNamespaces for ornamespace's PARAM_TYPE doesn't seem like

a good idea to me; I think you'd be better off using PARAM_TYPE => 'namespace',
because that'll let users query namespaces that used to have FlaggedRevs
enabled but don't anymore

Changed it to parameter type 'namespace' and also made it a multivalue parameter, perhaps someone might want to query more than one ns at at time..

I also added 'flagged_level_text' fields in the output as you suggested above.

Attached:

(In reply to comment #23)

Created an attachment (id=5398) [details]
added flagged_level_text output field, changed namespace param

  • Making FlaggedRevsApiHooks extend ApiQueryBase turns out not to do any good

after all (my mistake)

I think your first thought here was the right one: as it seems I can only
access ApiQueryBase' protected functions, if I call them from within a subclass
of it.

Really? Does removing 'extends ApiQueryBase' from FlaggedRevsApiHooks break stuff? That surprises me.

I'll take a closer look at this patch in a moment, and I'll probably apply it today.

w00t. Thanks a lot! Hope it won't get reverted too soon ;)

I’d like also to have access to the ‘size’ and ‘watched’ GET parameters of the special:-page in the flagged revisions API when retrieving oldreviewedpages.

I forgot filtering by ‘category’. (By the way: Is it possible to make filtering by category recursive (by providing the max recursing depth of course). Would it consume to much resources?)