rcid cannot be easily retrieved - implementation seems patchy
OpenPublic

Description

Author: martinp23

Description:
If I do a query using the recentchanges list with rctype=new, the rcid is given in the results. This seems to be the only way to retrieve the rcid for a new page creation using the API, while commonsense might suggest that a query for the first revision of a page (eg http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Judith%20Wood&rvdir=newer&rvlimit=1&rvprop=timestamp|ids ) would show it.

The rcid is needed to construct a URL in order to mark a page patrolled (on en.wikipedia).


Version: 1.12.x
Severity: enhancement
URL: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Judith%20Wood&rvdir=newer&rvlimit=1&rvprop=timestamp|ids

bzimport added a project: MediaWiki-API.Via ConduitNov 21 2014, 10:00 PM
bzimport set Reference to bz12394.
bzimport created this task.Via LegacyDec 23 2007, 3:21 PM
Catrope added a comment.Via ConduitDec 26 2007, 5:36 PM

rcid is only listed in list=recentchanges 'cause it's stored in the recent changes table, and not in the revision table. For regular edits (i.e. edits other than page creations), rcid is also only available through Special:Recentchanges. The [Mark this page patrolled] link is an exception, but as you point out you can get it through list=recentchanges&rctype=new as well. It could be a good idea to add filtering by page title to list=recentchanges though, I'll add that.

Catrope added a comment.Via ConduitFeb 27 2008, 10:29 AM

Added rctitles in r31348. Now you can get the rcid of a page through list=recentchanges&rctitles=Foo

bzimport added a comment.Via ConduitNov 8 2008, 2:47 AM

matthew.britton wrote:

Using 'rctitles' parameter on en.wikipedia seems to cause timeouts and teal screens of death.

Catrope added a comment.Via ConduitFeb 4 2009, 10:53 PM

(In reply to comment #3)

Using 'rctitles' parameter on en.wikipedia seems to cause timeouts and teal
screens of death.

Removed rctitles parameter because of performance concerns in r46823. Patrolling stuff isn't very easy right now, I agree, but fixing bug 17237 should improve that.

Krinkle added a comment.Via ConduitAug 29 2010, 1:04 PM

This one was removed in 1.15 (documentation wasn't updated yet...)

Through the commits related to this one I read that when this was first implemented it would scan the entire recent changes table, hence the slowness if the disired result is far in the back (or perhaps not in it at all).

However, I see in the current database structure there's a seperate column for rc_title.
I'm not sure since when this exists, and/or if it was originally utilized, but when using that in the query (AND WHERE rc_title='Foobar') it'd be like any other condition currently in the recentchanges API right (same thing for rc_user, with $this->addWhereFld(); )

Sorry if this was indeed the way it was already done or if it's not a good way at all, just hoping to get this one fixed :-)

Catrope added a comment.Via ConduitSep 12 2010, 3:01 PM

(In reply to comment #5)

However, I see in the current database structure there's a seperate column for
rc_title.
I'm not sure since when this exists, and/or if it was originally utilized, but
when using that in the query (AND WHERE rc_title='Foobar') it'd be like any
other condition currently in the recentchanges API right (same thing for
rc_user, with $this->addWhereFld(); )

The original implementation did do a WHERE on rc_namespace and rc_title, yes, but that's not the same as doing a WHERE on rc_user because the latter is indexed. Implementing this feature would require adding an index for it (kinda leery of that) and even then it'd have to sort by namespace, then title, then timestamp in order to work efficiently.

Krinkle added a comment.Via ConduitOct 6 2013, 6:49 PM

(In reply to comment #6)

(In reply to comment #5)
> However, I see in the current database structure there's a seperate column for
> rc_title.
> I'm not sure since when this exists, and/or if it was originally utilized, but
> when using that in the query (AND WHERE rc_title='Foobar') it'd be like any
> other condition currently in the recentchanges API right (same thing for
> rc_user, with $this->addWhereFld(); )
>
The original implementation did do a WHERE on rc_namespace and rc_title, yes,
but that's not the same as doing a WHERE on rc_user because the latter is
indexed. Implementing this feature would require adding an index for it
(kinda
leery of that) and even then it'd have to sort by namespace, then title, then
timestamp in order to work efficiently.

There is a rc_namespace_title index though. But unlike the one for rc_user, it doesn't have rc_timestamp.

mediawiki-core@master:/maintenance/tables.sql:
INDEX rc_timestamp ON recentchanges (rc_timestamp);
INDEX rc_namespace_title ON recentchanges (rc_namespace, rc_title);
INDEX rc_cur_id ON recentchanges (rc_cur_id);
INDEX new_name_timestamp ON recentchanges (rc_new,rc_namespace,rc_timestamp);
INDEX rc_ip ON recentchanges (rc_ip);
INDEX rc_ns_usertext ON recentchanges (rc_namespace, rc_user_text);
INDEX rc_user_text ON recentchanges (rc_user_text, rc_timestamp);

Would it make sense for rc_namespace_title to have it? I wonder what it is used for and if those uses would have a problem with the extra rc_timestamp sort. The default sort for rc_namespace_title is presumably rc_id which should have be very close to the sort order of rc_timestamp.

The main reason it needs rc_timestamp is not for the sort order, but to be able to do rcstart and rcend.

Krinkle added a comment.Via ConduitOct 6 2013, 6:58 PM

Filed bug 55377 for adding support for rctitles to query=recentchanges.

Qgil added a subscriber: Qgil.Via WebJan 12 2015, 12:04 PM

@Catrope, this is one of the oldest tasks assigned to someone. Are you planning to work on it, and is this Normal priority correct?

Qgil placed this task up for grabs.Via WebSat, Feb 14, 3:37 PM
Qgil set Security to None.
Anomie moved this task to Needs Code on the MediaWiki-API workboard.Via WebFri, Feb 20, 8:19 PM

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.