Page MenuHomePhabricator

Bad metadata for a single file errors out the complete imageinfo prop request
Open, Needs TriagePublic

Description

Given the file: File:Михаил Качковский и современная галицко-русская литература Часть 1 1876.djvu

Passing it to the imageinfo page prop api generates the error:

{
    "error": {
        "code": "urlparamnormal",
        "info": "Could not normalize image parameters for Михаил_Качковский_и_современная_галицко-русская_литература_Часть_1_1876.djvu.",
        "docref": "See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes."
    },
    "servedby": "mw1284"
}

I poked in logstash but I don't see anything related in the logs. A quick look over the related code in ApiQueryImageInfo suggests this can be caused by individual files having bad metadata extracted by their handlers? If that is true, then I think the main problem is that a file with bad metadata can fatal the whole request, rather than returning an error for that particular page. The error is additionally not structured in a way that a program could easily adjust it's request and re-send.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
ResolvedJdlrobson
OpenFeatureNone
OpenFeatureNone
OpenNone
Resolvedcscott
Duplicatecscott
OpenFeatureNone
OpenFeatureNone
OpenNone
OpenNone
OpenNone
ResolvedUmherirrender
DuplicateNone
OpenNone
DeclinedNone
OpenBUG REPORTNone
OpenNone
OpenFeatureNone
OpenFeatureNone
OpenNone
OpenNone

Event Timeline

When removing the urlwidth the error is gone. It seems a width is invalid for djvu files.

The file does not have bad metadata, but it cannot be displayed and therefore cannot handle the width

When removing the urlwidth the error is gone. It seems a width is invalid for djvu files.

The file does not have bad metadata, but it cannot be displayed and therefore cannot handle the width

Is the argument that if something sends 40 titles with unknown content asking for image info (for example, say they are reading the recent changes feed and fetching info about new uploads) and specifying a thumbnail width, that the API exiting and returning only an error for all of the 40 titles queried is the correct solution?

Irregardless here is the same API request to a different djvu file. No failure. I am fairly convinced that the problem is not that a specifying a width parameter crashes all requests for djvu files and that something about specific files causes the failure.

Your second example is also shown on commons while your first example does not have a thumb on commons. So there is something with the file mediawiki cannot handle. That maybe is another bug.

That prop=imageinfo returns an error when requesting multi images is not good and should be changed. But Anomie already linked another task that the module needs rewritting in mediawiki

To determine the extent of the problem, I ran a test, where I tried to use a generator to grab image info for all images uploaded to commons.wikimedia.org on each day of November 2019. Unfortunately, more than half of the days failed irrecoverably due to this bug. I.e., a request of the following form will fail for a little over half of the dates in November tested:

https://commons.wikimedia.org/w/api.php?action=query&generator=allimages&prop=imageinfo&gailimit=40&gaisort=timestamp&gaistart=2019-11-02T00:00:00Z&gaiend=2019-11-03T00:00:00Z&iiprop=url|user|dimensions|extmetadata&iiurlwidth=300&format=json&gaicontinue=20191102113020|Eduard_Karel_(1861-1950).jpg

This example is for the date 2019-11-02. I've attached a file with a list of dates for which the generator functionality is broken by this bug in November 2019, along with the request that fails irrecoverably for each date (by irrecoverably, I mean that the error does not contain any information that allows a program to skip over the broken file and continue).

The issue appears to be with images of size 0px by 0px, or at least that's what the dimensions property shows for each image (I can retrieve imageinfo for each after removing the iiurlwidth param).