Page MenuHomePhabricator

prop=revisions sorts by rev_id, not by rev_timestamp
Closed, ResolvedPublic

Description

Okay there is something weird going on. I'm not sure if this is the API or a inconsistency in the database. I want to get the first revision (and optionally n revisions after that). This does work with the The Big Bang Theory article titles=The Big Bang Theory&rvendid=145491640. The rvendid is not the ID of the first revision but some revision I choose randomly. It does return me the first revision visible by the parentid.

Now this is a different story on the Main Page titles=Main Page&rvendid=139871. It now returns directly that revision which might be because it's parentid is higher than it's current id. Even http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=ids%7Ctimestamp&titles=Main%20Page&rvstartid=139890&rvdir=older&rvlimit=50 this is just returning two revisions.

See also:

Event Timeline

XZise created this task.Mar 7 2015, 2:10 PM
XZise updated the task description. (Show Details)
XZise raised the priority of this task from to Needs Triage.
XZise added a project: MediaWiki-API.
XZise added a subscriber: XZise.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 7 2015, 2:10 PM
Anomie renamed this task from Getting last revision with rvendid set doesn't work always to prop=revisions sorts by rev_id, not by rev_timestamp.Mar 7 2015, 2:51 PM
Anomie set Security to None.

"rvendid" has nothing to do with getting the first revision, that just tells the API which revision to stop at when listing. rvdir=newer is what's telling it to start at the first revision rather than the latest.

The problem you're seeing is that prop=revisions sorts revisions by id rather than by timestamp. Normally a revision with a higher timestamp will also have a higher id, but for really old pages that were imported from earlier versions of Wikipedia this doesn't necessarily hold true. If Special:Import preserves timestamps (I forget offhand if it does), that would be another source for a mismatch between id-order and timestamp-order.

Gerrit change 188843 should fix this, if @Springle says the new queries are good.

Change 188843 had a related patch set uploaded (by Anomie):
API: Improve queries for prop=revisions in enum mode

https://gerrit.wikimedia.org/r/188843

Anomie moved this task from Unsorted to Needs Review on the MediaWiki-API board.Mar 7 2015, 2:59 PM
XZise added a comment.Mar 7 2015, 3:49 PM

Oh that would make sense. I had already problems with that page a few days ago and there I got a weird query where the revisions weren't in order (aka parentid != next revision's id, with default ordering): action=query&prop=revisions&rvprop=ids|timestamp&titles=Main Page&rvstartid=140204&rvlimit=500

XZise added a comment.Mar 8 2015, 1:01 AM

How does 188843 actually fix this bug? Only by switching the where clause additions in lines 243-246 (of PS2)?

issues of rev_id vs timestamp also come up when un-deleting material which was deleted before rev_id was included in the archive table.

How does 188843 actually fix this bug? Only by switching the where clause additions in lines 243-246 (of PS2)?

Yeah. addWhereRange() and addTimestampWhereRange() have a side effect of appending the field to the ORDER BY clause in the query.

Aklapper triaged this task as Normal priority.Mar 9 2015, 12:10 PM
He7d3r updated the task description. (Show Details)Mar 17 2015, 5:14 PM

Change 188843 merged by jenkins-bot:
API: Improve queries for prop=revisions in enum mode

https://gerrit.wikimedia.org/r/188843

Anomie closed this task as Resolved.Apr 25 2015, 12:19 PM
Anomie claimed this task.

This should be deployed to WMF wikis with 1.26wmf4, see https://www.mediawiki.org/wiki/MediaWiki_1.26/Roadmap for the schedule.