Page MenuHomePhabricator

API: Implement rctitles parameter in query=recentchanges
Closed, ResolvedPublic

Description

History:

See also:

This would enable:

  • Getting rcid of an unpatrolled page creation from the API.
  • Use recentchanges with a generator module.
  • Use recentchanges with watch list (maybe?).
  • Allow tools to mass-patrol all unpatrolled changes to a single page.
In T14394#167673, Krinkle wrote:
In T14394#167660, Catrope wrote:
In T14394#167639, Krinkle wrote:

However, I see in the current database structure there's a seperate column for rc_title.
I'm not sure since when this exists, and/or if it was originally utilized, but when using that in the query (AND WHERE rc_title='Foobar') it'd be like any other condition currently in the recentchanges API right (same thing for rc_user, with $this->addWhereFld(); )

The original implementation did do a WHERE on rc_namespace and rc_title, yes, but that's not the same as doing a WHERE on rc_user because the latter is indexed. Implementing this feature would require adding an index for it (kinda leery of that) and even then it'd have to sort by namespace, then title, then timestamp in order to work efficiently.

There is a rc_namespace_title index though. But unlike the one for rc_user, it doesn't have rc_timestamp.

mediawiki-core@master:/maintenance/tables.sql:
 INDEX rc_timestamp ON recentchanges (rc_timestamp);
 INDEX rc_namespace_title ON recentchanges (rc_namespace, rc_title);
 INDEX rc_cur_id ON recentchanges (rc_cur_id);
 INDEX new_name_timestamp ON recentchange (rc_new,rc_namespace,rc_timestamp);
 INDEX rc_ip ON recentchanges (rc_ip);
 INDEX rc_ns_usertext ON recentchanges (rc_namespace, rc_user_text);
 INDEX rc_user_text ON recentchanges (rc_user_text, rc_timestamp);

Would it make sense for rc_namespace_title to have it? I wonder what it is used for and if those uses would have a problem with the extra rc_timestamp sort. The default sort for rc_namespace_title is presumably rc_id which should have be very close to the sort order of rc_timestamp.

The main reason it needs rc_timestamp is not for the sort order, but to be able to do rcstart and rcend.

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:38 AM
bzimport set Reference to bz55377.
bzimport added a subscriber: Unknown Object (MLST).

@Roan: Are you really sure we need more indices for this? What about watchlist.

Watchlist uses the same recentchanges table with a major INNER JOIN on watchlist (as opposed to just a few "WHERE rc_title IN ( ..)").

Watchlist also does the rc_timestamp sort and range.

Of course, it is possible that this just proves the opposite (watchlist is also slow, but we can't remove it, and we can remove rctitles from the API to avoid it from getting worse).

Is that the case?

(In reply to comment #0)

  • Use recentchanges with a generator module.

To use it with a generator you'd need to use "titles" rather than "rctitles", and you'd probably also want to move the module to "prop=recentchanges" instead of "list=recentchanges" (breaking BC). At that point, adding a "prop=recentchanges" (which might need a different name) might make more sense.

(In reply to comment #1)

Of course, it is possible that this just proves the opposite (watchlist is
also slow, but we can't remove it, and we can remove rctitles from the API
to avoid it from getting worse).

I think that's a good possibility. The query used for the watchlist filesorts, which is something we try to avoid in the API for performance reasons.

Is this why we have a special database query group just for the watchlist?

Also, in addition to Watchlist, there is also RecentchangesLinked which also essentially does this already.

(In reply to comment #2)

(In reply to comment #0)

  • Use recentchanges with a generator module.

To use it with a generator you'd need to use "titles" rather than "rctitles",
and you'd probably also want to move the module to "prop=recentchanges"
instead of "list=recentchanges"..

Hm.. right, our generator model requires a separate module if the titles filter is optional (e.g. prop=categories requires titles=, so you can use a generator module as input, but prop=categories itself can also be a generator to another module).

Recentchanges is already a generator module (it can be a provider of titles to another query module), but can't yet take titles= or a generator as input. And since we must keep titles= as an optional param, the filtered version of it has to be a (separate) prop module.

Change 92654 had a related patch set uploaded by Krinkle:
[WIP] recentchanges: Implement support for pagesets

https://gerrit.wikimedia.org/r/92654

@Brad: Using a prop module might make sense, but the problem there is usability.

From the application's perspective, title is an arbitrary filter, just like 'user', 'namespace' or 'tag'. It would be a pain if one would have to radically change the query just to use the title filter.

However if the indexes support it, we might as well do both. I don't see why we wouldn't implement a 'rctitles'. I'm rephrasing the bug back to how it was. In this case the main purpose (and why I filed the bug) is to be able to filter by title on the recentchanges query.

Another problem is that (afaik) prop wouldn't work on pages that have been renamed or wear a different title now. e.g. it wouldn't be querying rc_title/rc_namespace but page_namespace/page_title/page_id, and rc_namespace/rc_title, which is not going to work as it will miss edits made before and between renames of a page.

We'd have to map it to rc_cur_id inside the prop query.

Anyway, builing a prop module for getting recent changes made to one or more pages is nice, but not what this bug is asking for (plus, it'd need an extra index).

Change 92654 abandoned by Krinkle:
[WIP] recentchanges: Implement support for pagesets

Reason:
Meh, no longer cool. Going to use rcstream instead with a live page filter on the client-side.. Can't wait for the new API to happen with the app I'm working on.

https://gerrit.wikimedia.org/r/92654

Tgr set Security to None.

Change 419798 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/core@master] Add ability to filter based on rc_title in API

https://gerrit.wikimedia.org/r/419798

Krinkle removed a subscriber: wikibugs-l-list.
Krinkle unsubscribed.

(If we're going to reformat the quotes, we may as well go all the way to make it more readable)

@Ladsgroup How does this task (adding rctitle filtering feature to recentchanges API), relate to T184485 (remove of autopatrol logging, make rc_patrolled 3-state), and in what way is it required for completion of that task?

@Ladsgroup How does this task (adding rctitle filtering feature to recentchanges API), relate to T184485 (remove of autopatrol logging, make rc_patrolled 3-state), and in what way is it required for completion of that task?

That is a very valid question. The reason is that if we stop logging autopatrol actions and we don't provide a way for users to query patrol status of edits for given pages, thus we break gadgets without any way to fix it. This has been brought up after announcement of the change.

Change 419798 merged by jenkins-bot:
[mediawiki/core@master] Add ability to filter based on rc_title in API

https://gerrit.wikimedia.org/r/419798