Page MenuHomePhabricator

Category.subcategories() or Category.members(namespaces=14) missing certain subcategories
Closed, ResolvedPublic

Description

Category:Female_Wikipedians has a subcategory, Category:Lesbian_Wikipedians. Category.members() lists it, however Category.members(namespaces=14) doesn't, nor does Category.subcategories().

$ python pwb.py shell
>>> import pywikibot
>>> site = pywikibot.Site('en', 'wikipedia')
>>> cat = pywikibot.Category(site, 'Category:Female_Wikipedians')
>>> cat.categoryinfo
{u'files': 0, u'subcats': 1, u'pages': 2595, u'size': 2596}
>>> list(cat.subcategories())
[]
>>> list(cat.members(namespaces=14))
[]

jayvcb isn't missing Category:Lesbian_Wikipedians, however he's missing Category:Gay_Wikipedians from Category:Male_Wikipedians.

I've been unable to find other pages that demonstrate this bug.

Event Timeline

Unicodesnowman renamed this task from Category.subcategories() or Category.members(namespaces=14) failing for certain articles to Category.subcategories() or Category.members(namespaces=14) missing certain subcategories.
Unicodesnowman raised the priority of this task from to Needs Triage.
Unicodesnowman updated the task description. (Show Details)
Unicodesnowman added a project: Pywikibot.
Unicodesnowman changed Security from none to None.
Unicodesnowman subscribed.

Okay I'm not sure if that is specific to categorymembers, but when you request all members of a category which are categories it still tries to return non-category pages so that the category is returned in the last batch:

https://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Female%20Wikipedians&cmnamespace=14&continue=-||&cmcontinue=page|5a414e45574f4c462f55534552424f58|14056212&cmlimit=500

Maybe our QueryGenerator stops requesting pages because it got an empty result set:

https://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Female%20Wikipedians&cmnamespace=14&continue=&cmlimit=500

Okay I think I'm closer to it now: The 'query' entry is missing because no pages are returned but on the first request the userinfo was returned which generated a 'query' entry. For some reason it's not on the second so that entry is missing and it stops iteration:

  • {'batchcomplete': '', 'continue': {'gcmcontinue': 'page|434c45414e555042414245|25600657', 'continue': 'gcmcontinue||userinfo'}, 'query': {'userinfo': {'id': 7818389, 'name': 'XZise'}}}
  • {'batchcomplete': '', 'continue': {'gcmcontinue': 'page|494d4d4143554c4154454845415254|5038287', 'continue': 'gcmcontinue||userinfo'}}

So the problem is that QueryGenerator must continue even if 'query' is missing. Or the MediaWiki itself has a bug and it should return a query entry even if it's empty.

Okay I tested it on a 1.21 wiki and got results immediately:

I query only one page and on the English Wikipedia it would be either a template or a category so I'd only get a result in one request but not both. And both use the simplified continuation so that is not the cause of the difference.

Change 180779 had a related patch set uploaded (by XZise):
[FIX] QueryGenerator: Allow missing 'query' entry

https://gerrit.wikimedia.org/r/180779

Patch-For-Review

Change 180779 merged by jenkins-bot:
[FIX] QueryGenerator: Allow missing 'query' entry

https://gerrit.wikimedia.org/r/180779

XZise claimed this task.
XZise removed a project: Patch-For-Review.