Page MenuHomePhabricator

This result was truncated because it would otherwise be larger than the limit of 12,582,912 bytes
Closed, DuplicatePublic

Description

$ python pwb.py scripts/replace.py -family:commons -cat:"Scans by the Internet Archive selected by BEIC" etc. etc.
WARNING: Http response status 500
WARNING: Non-JSON response received from server commons:commons; the server may be down.
Set gcmlimit = ['250']
WARNING: Waiting 5 seconds before retrying.
WARNING: Http response status 500
WARNING: Non-JSON response received from server commons:commons; the server may be down.
Set gcmlimit = ['125']
WARNING: Waiting 10 seconds before retrying.
WARNING: API warning (result): This result was truncated because it would otherwise  be larger than the limit of 12,582,912 bytes
Retrieving 50 pages from commons:commons.
No changes were necessary in [[File:Alberti - De re aedificatoria, 1541.djvu]]

This is probably due to the img_metadata field being huge for DjVu files, see also https://commons.wikimedia.org/w/index.php?title=Help_talk:VisualFileChange.js&diff=prev&oldid=162565292 for a similar problem in a JavaScript request for prop=imageinfo.

I don't think that an error 500 is the expected result.

Event Timeline

Nemo_bis raised the priority of this task from to Needs Triage.
Nemo_bis updated the task description. (Show Details)
Nemo_bis subscribed.

Here is a relatively simple request: http://commons.wikimedia.org/w/api.php?action=query&titles=File:Alberti%20-%20De%20re%20aedificatoria,%201541.djvu&prop=imageinfo&iiprop=metadata

But I think the 500-errors are happening because it tries to get MANY pages (by default as many as possible) so it uses a limit of 500 and after each error it halves it (as you can see in the gcmlimit). To be honest I'm not sure what could be improved here. Maybe the metadata is unreasonably large or maybe the API does not factor in the size of the metadata and uses a lower limit itself (A sidenote here: I actually don't know if the API can decide to return less pages than requested via limit and splits it up into multiple parts).

I also don't think that this issue is related to Pywikibot's replace script but in general to pywikibot.data.api.PageGenerator as that adds many iiprops (including metadata) to the request. pywikibot-core has no support for incomplete image info data so it's not like it could just skip the metadata and if something needs it later it could just rerequest it (although the question is if we should load the imageinfo by default).

Large metadata in API responses can be troublesome (T86611) but 500 is definitely not expected. Can reproduce via a manual API request? Also, can you get the response text (body) for the HTTP 500? (Pywikibot should probably log that, anyway.)

Also, can you get the response text (body) for the HTTP 500? (Pywikibot should probably log that, anyway.)

I suspect it'll be whatever "took too long" page you get thanks to Gerrit change 206440 and/or Gerrit change 206626.