Page MenuHomePhabricator

Persistant error 500 getting category members
Open, MediumPublic

Description

Calls today using pywikibot categorymembers (which uses the API call to generate category members) or the articles() of a category page, get stuck in apparent indefinite error 500 loops.

Is this a known transient error, or will this remain unusable?

Example
site.categorymembers(cat, namespaces=6, member_type=['page', 'file'])
where cat = pywikibot.Category(site, "Category: Documents from Library of Congress Packard Campus")
and site = pywikibot.getSite('commons', 'commons')
this example category only has 208 members at the time of writing.

This is part of trying to fix WMF server outages this past week, where uploaded files have been posted with no ImagePage text content (i.e. pageid == 0).

WARNING: Http response status 500
WARNING: Non-JSON response received from server commons:commons; the server may be down.
Set gcmlimit = ['250']

Event Timeline

Xqt subscribed.

The server is under maintenance currently, The url of your call above is:
https://commons.wikimedia.org/w/api.php?gcmtitle=Category:Documents+from+Library+of+Congress+Packard+Campus&gcmprop=ids|title|sortkey&gcmtype=page|file&prop=info|imageinfo|categoryinfo&inprop=protection&iiprop=timestamp|user|comment|url|size|sha1|metadata&iilimit=max&generator=categorymembers&action=query&indexpageids=&continue=&gcmnamespace=6&gcmlimit=125&meta=userinfo&uiprop=blockinfo|hasmsg&maxlag=5&format=jsonfm
(I just changed from jsin to jsonfm)

Therefore I think this bug is not pywikibot related but upstream.

The url can be shown with these statements:

members = site.categorymembers(cat, namespaces=6, member_type=['page', 'file']) 
str(members.request)

It throws:

PHP fatal error:
Allowed memory size of 698351616 bytes exhausted (tried to allocate 6128792 bytes)

It throws:

PHP fatal error:
Allowed memory size of 698351616 bytes exhausted (tried to allocate 6128792 bytes)

Decreasing step parameter may help a bit:

>>> import pwb, pywikibot
>>> site = pywikibot.Site('commons', 'commons')
>>> cat = pywikibot.Category(site, "Category: Documents from Library of Congress Packard Campus")
>>> pywikibot.config.step = 1
>>> members = site.categorymembers(cat, namespaces=6, member_type=['page', 'file'])
>>> for m in members: print(m)

[[commons:File:1001 Films-a Reference Book for Non-Theatrical Film Users (1920) (IA 1001filmsarefere00unse).pdf]]
[[commons:File:American cinematographer. (Vol. 31, 1950) (IA americancinemato31unse).pdf]]
[[commons:File:Cine-Mundial (Jan-Dec 1916) (IA cinemundial01unse).pdf]]
[[commons:File:Cine-Mundial (Jan-Dec 1917) (IA cinemundial02unse).pdf]]
[[commons:File:Cine-Mundial (Jan-Dec 1918) (IA cinemundial03unse).pdf]]
[[commons:File:Cine-Mundial (Jan-Dec 1921) (IA cinemundial06unse).pdf]]
[[commons:File:Cine-Mundial (Jan-Dec 1922) (IA cinemundial07unse).pdf]]
[[commons:File:Cine-Mundial (Jan-Dec 1923) (IA cinemundial08unse).pdf]]
[[commons:File:Color Photography (IA colorphotography00newy).pdf]]
[[commons:File:Educational Film Magazine (Jan-Jun 1919) (IA educationalfilmm01city).pdf]]
[[commons:File:Exhibitor's Trade Review (Nov 1924-Feb 1925) (IA exh00newy).pdf]]
[[commons:File:Exhibitor's Trade Review (Dec 1922-Feb 1923) (IA exhibitorst00newy).pdf]]
...

but I am wondering that content=False does not help obviously

When looking into T258036, I could reproduce this. This is the request sent to server:

"action": "query",
"format": "json",
"prop": "info|imageinfo|categoryinfo",
"meta": "userinfo",
"indexpageids": 1,
"generator": "categorymembers",
"inprop": "protection",
"iiprop": "timestamp|user|archivename|comment|url|size|sha1|metadata",
"iilimit": "max",
"uiprop": "blockinfo|hasmsg",
"gcmtitle": "Category:License review needed",
"gcmprop": "ids|title|sortkey",
"gcmtype": "page|file",
"gcmlimit": "500"

I believe "iiprop": "metadata" is the cause. It can be large for some files, see e.g. this API call.

CDanis subscribed.

This issue has been ongoing for a while and likely merits some CPT attention.

From #wikimedia-operations:

09:35:02	<xover>	I'm seeing multiple onwiki reports of problems related to Commons, that may be unrelated but kinda smell of a common infrastructure problem somewhere.
09:36:10	<xover>	Fæ reported trouble getting members of a category from the API: https://commons.wikimedia.org/wiki/Commons:Village_pump/Technical#Please_help_prioritize_the_Commons_API_"error_500"_bug_on_searches_and_category_queries
09:36:38	<xover>	Multiple reports that FileImporter fails: https://commons.wikimedia.org/wiki/Commons:Village_pump/Technical#File_importer_is_broken
09:37:26	<xover>	(imports tried from enWS, jaWP, and a third by a Chinese user)
09:37:57	<xover>	And then there was a report of the ia-upload tool failing an upload.
09:38:56	<xover>	(ia-upload, for those not aware, is a toolforge tool that grabs book scans from the Internet Archive and upload them to Commons)
09:40:12	<xover>	I believe a manual/UploadWizard upload of the same ~300MB PDF (i.e. chunked upload) file also failed, but haven't tested that myself.
09:41:51	<xover>	Fæ's API problem is apparently a couple of weeks old and ongoing, while the two upload/import problems look like they may have started yesterday.

For the FileImporter issue mentioned on IRC see https://www.mediawiki.org/wiki/Topic:Vqmtib4ho61z4bji for two files that fail and the error message generated. That problem (and the possibly transient, possibly entirely unrelated ia-upload problem) were both reported in the last 24 hours; while the original issue in this task has apparently been happening for a month or thereabouts.

Absent an infrastructure-ish underlying problem I would tend to guess that they are separate issues, and that the FileImporter (and possible ia-upload) problem is related to yesterday's deploy of MW 1.36/wmf.1.

The FileImporter issue is apparently T258666 and not obviously related.