Page MenuHomePhabricator

cmstart as continue param when muliple categories have same cl_timestamp
Closed, DeclinedPublic

Description

Author: cannon.danielc

Description:
There are some problems with using cmstart as the continue parameter for the categorymembers query when multiple category links share the same cl_timestamp, as is illustrated in the above link. If, for instance, you have three pages, Foo, Bar, and Baz, all added to the same category, Gah, at the same time, then querying the category members of that category with a limit of 2 and sorting by timestamp will give you "Foo" and "Bar" with a query-continue parameter cmstart equal to the timestamp of Baz, which happens also to be the timestamp of Baz. So you then query again using that cmstart and get back "Foo" and "Bar" with the same cmstart as the query-continue.

Possible fix would be to use cmcontinue as the continue parameter, with both a timestamp and a page id, the cl_from of the last (limit + 1) row of the previous query.


Version: 1.13.x
Severity: minor
URL: http://de.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Wikipedia:Quellen%20fehlen&cmsort=timestamp&cmdir=asc&cmlimit=4&cmprop=title|timestamp&cmstart=20060624022423

Details

Reference
bz13871

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:13 PM
bzimport set Reference to bz13871.

In practice, you wouldn't be using a limit as ridiculously low as 2, but one in the hundreds range (or thousands, if you're a bot). The odds of hundreds of pages having the exact same category timestamp are close to zero. We page a lot of stuff by timestamp (not just the API, the UI as well), and getting more than two things (revisions, log events, whatever) with the same timestamp hardly ever happens. WONTFIXing this, please REOPEN if you have a valid argument (such as a reason why you absolutely need to use cmlimit=2)

actually, the only usage I know about is a function of mine which navigates from one page in a category to the next one. I exclusively use cmlimit=1 for this issue. Take a look at function nextItem (category, namespace) in http://de.wikipedia.org/wiki/Benutzer:Codeispoetry/helperFunctions.js.

Then you may want to sort by sortkey, not timestamp.