Page MenuHomePhabricator

Categorymembers counting behaviour when searching in a namespace
Closed, ResolvedPublic

Description

When looking for subcategories of a known cat, it is tempting to use the API with &cmnamespace=14 (14 being the NS id for Category: ). For categories that have both subcategories and articles as members, this triggers an odd behaviour :

http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Physics

This lists (by sortkey) the 10 (default) first categorymembers of [[Category:Physics]] on enwp, as sorted by sortkey.

Among those are two categories (subcategories of Category:Physics) : [[Category:Fundamental physics concepts]] (sortkey:"*") and [[Category:Physicists]] (sortkey:"*Physicists" ; possible mistake on the user side for the sortkey by the way).

Now lets check for the same list when adding &cmnamespace=14.

http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Physics&cmnamespace=14

Expected output would be to list the 10 (default) first members of the category that are in NS 14. However, the output is slightly different : it actually lists pages that are in NS 14 *and* in the 10 first categorymembers as shown above.
We then get only 2 subcategories instead of the expected 10. A query-continue is shown, which is the sortkey for a page that's outside NS 14.

Similar output is shown for &cmnamespace=45 (which is the number of subcats according to the GUI) : only 10 (coincidence) subcats are actually outputted, albeit with a query-continue, instead of the 45 expected. http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Physics&cmnamespace=14&cmlimit=45

Similar output is also shown for &cmnamespace=0, or whatever : &cmnamespace=0&cmlimit=10 displays 7 results http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Physics&cmnamespace=0&cmlimit=10

&cmlimit does have an odd behaviour in this matter : instead of counting results, it counts "potential" results (in all namespaces) before outputting elements of requested namespace(s).

Although this is merely annoying for small categories such as [[Category:Physics]] (283 articles and 45 subcats according to the GUI), it can become a major problem when looking for subcategories of a far bigger category. It could also add some strain on the servers, should anything try to list subcategories of a category with many pages and a few subcats : many requests will occur for a rather limited result.

One should be able to list :

  • the X {{first categorymembers in a given namespace} of a given category}
  • rather than the {categorymembers in a given NS} among the {X first categorymembers of a category}

I don't know if I'm very clear here, my own head is starting to ache :D


Version: unspecified
Severity: normal

Details

Reference
bz24354

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:06 PM
bzimport set Reference to bz24354.

"Similar output is shown for &cmnamespace=45 (which is the number of subcats
according to the GUI)" : lovely mistake. I obviously meant "for &cmlimit=45". Sorry about that ;)

From http://en.wikipedia.org/w/api.php * list=categorymembers (cm) *

NOTE: Due to $wgMiserMode, using this may result in fewer than "limit" results returned before continuing; in extreme cases, zero results may be returned.
  • This bug has been marked as a duplicate of bug 19640 ***
  • Bug 27317 has been marked as a duplicate of this bug. ***