This result was truncated because it would otherwise be larger than the limit of 12,582,912 bytes
Closed, DuplicatePublic
Actions

Assigned To

None

Authored By

	Nemo_bis
	Jun 4 2015, 3:01 PM

Description

$ python pwb.py scripts/replace.py -family:commons -cat:"Scans by the Internet Archive selected by BEIC" etc. etc.
WARNING: Http response status 500
WARNING: Non-JSON response received from server commons:commons; the server may be down.
Set gcmlimit = ['250']
WARNING: Waiting 5 seconds before retrying.
WARNING: Http response status 500
WARNING: Non-JSON response received from server commons:commons; the server may be down.
Set gcmlimit = ['125']
WARNING: Waiting 10 seconds before retrying.
WARNING: API warning (result): This result was truncated because it would otherwise  be larger than the limit of 12,582,912 bytes
Retrieving 50 pages from commons:commons.
No changes were necessary in [[File:Alberti - De re aedificatoria, 1541.djvu]]

This is probably due to the img_metadata field being huge for DjVu files, see also https://commons.wikimedia.org/w/index.php?title=Help_talk:VisualFileChange.js&diff=prev&oldid=162565292 for a similar problem in a JavaScript request for prop=imageinfo.

I don't think that an error 500 is the expected result.

Related Objects
Search...

Status	Assigned	Task
Duplicate	None	T101400 This result was truncated because it would otherwise be larger than the limit of 12,582,912 bytes
Open	None	T89971 ApiQueryImageInfo is crufty, needs rewrite
Open	None	T132947 Make it possible for extensions to add additional info to the fileinfo/imageinfo response.
Declined	None	T134641 Make TextHandler use the timedtext videoinfo api for remoteDB sources
Open	None	T187759 API is returning 500 status code
Open	None	T201205 Bad metadata for a single file errors out the complete imageinfo prop request
Open	None	T235011 API query times out

Event Timeline

Nemo_bis created this task.Jun 4 2015, 3:01 PM

Nemo_bis raised the priority of this task from to Needs Triage.

Nemo_bis updated the task description. (Show Details)

Nemo_bis subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 4 2015, 3:01 PM

Nemo_bis added a project: MediaWiki-Action-API.Jun 4 2015, 3:01 PM

Nemo_bis set Security to None.

Nemo_bis added a project: MediaWiki-File-management.Jun 4 2015, 3:08 PM

Restricted Application added a project: Multimedia. · View Herald TranscriptJun 4 2015, 3:08 PM

Here is a relatively simple request: http://commons.wikimedia.org/w/api.php?action=query&titles=File:Alberti%20-%20De%20re%20aedificatoria,%201541.djvu&prop=imageinfo&iiprop=metadata

But I think the 500-errors are happening because it tries to get MANY pages (by default as many as possible) so it uses a limit of 500 and after each error it halves it (as you can see in the gcmlimit). To be honest I'm not sure what could be improved here. Maybe the metadata is unreasonably large or maybe the API does not factor in the size of the metadata and uses a lower limit itself (A sidenote here: I actually don't know if the API can decide to return less pages than requested via limit and splits it up into multiple parts).

I also don't think that this issue is related to Pywikibot's replace script but in general to pywikibot.data.api.PageGenerator as that adds many iiprops (including metadata) to the request. pywikibot-core has no support for incomplete image info data so it's not like it could just skip the metadata and if something needs it later it could just rerequest it (although the question is if we should load the imageinfo by default).

Large metadata in API responses can be troublesome (T86611) but 500 is definitely not expected. Can reproduce via a manual API request? Also, can you get the response text (body) for the HTTP 500? (Pywikibot should probably log that, anyway.)

In T101400#1338251, @Tgr wrote:

Also, can you get the response text (body) for the HTTP 500? (Pywikibot should probably log that, anyway.)

I suspect it'll be whatever "took too long" page you get thanks to Gerrit change 206440 and/or Gerrit change 206626.

Anomie added a subtask: T89971: ApiQueryImageInfo is crufty, needs rewrite.Jun 4 2015, 8:04 PM

Anomie moved this task from Unsorted to Needs details or plan on the MediaWiki-Action-API board.

Jdforrester-WMF triaged this task as Low priority.Sep 4 2015, 6:55 PM

Restricted Application added subscribers: Steinsplitter, Matanya. · View Herald TranscriptSep 4 2015, 6:55 PM

Jdforrester-WMF moved this task from Untriaged to Backlog on the Multimedia board.Sep 4 2015, 7:00 PM

Rillke mentioned this in T94562: Chunked/stashed uploads fail for some pdf and djvu files: "No specifications provided to ArchivedFile constructor.".Sep 4 2015, 9:51 PM

GOIII mentioned this in T107664: Some DjVu files not being rendered in commons (showing up as "0 × 0 pixels", despite the file size in MB being nonzero).Nov 4 2015, 12:44 AM

Bawolff closed this task as a duplicate of T86611: API does not fail gracefully when data is too large.Nov 4 2015, 1:16 AM

Restricted Application added a project: Internet-Archive. · View Herald TranscriptNov 4 2015, 1:16 AM

Nemo_bis mentioned this in T86611: API does not fail gracefully when data is too large.Apr 21 2016, 5:46 AM

Anomie mentioned this in T195992: API claims query is over 12mb but it is not.May 30 2018, 8:36 PM

Restricted Application added a project: Commons. · View Herald TranscriptMay 30 2018, 8:36 PM

Dominicbm mentioned this in T253591: page generators can truncate responses when there is excessive metadata (e.g. DjVu/PDF files).May 25 2020, 9:14 PM

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:43 PM

This result was truncated because it would otherwise be larger than the limit of 12,582,912 bytes Closed, DuplicatePublicActions

Description

Related ObjectsSearch...

Event Timeline

This result was truncated because it would otherwise be larger than the limit of 12,582,912 bytes
Closed, DuplicatePublic
Actions

Related Objects
Search...