Page MenuHomePhabricator

Image size is not determined for new PNG files with (partially) corrupt metadata
Closed, ResolvedPublic

Description

When uploading a new version of an existing file here, the dimensions are shown as "0 × 0 (261 KB)". This is the correct file size in KB, but the dimensions should be 1771x685 px. While the image is displayed correctly here, it does not display correctly on the Wikipedia pages that use it, and the thumbnail at the top of the File page is also broken (just a blank rectangle).

bug.png (495×756 px, 27 KB)

I tried to correct the problem by saving the file in a different program and uploading again, but the error persisted. I have also tried to undo my edit, but it wouldn't let me do, saying my edit had already been undone (this does not seem to be the case). I then asked for help at the Help desk, and a different user (Jeff G.) has since tried to undo it. Their attempt to get the file back to the old version were unsuccessful, the dimensions are still broken, "0 × 0 (108 KB)". 108 KB is the correct size for the previous file version, but the dimensions remain at 0x0.

My browser: Firefox 89.0.2 (64 bit), on OS Windows 10. File created with MSPaint (1st attempt), then with GIMP2 (2nd attempt), same result with both. User Jeff G.'s failed attempt to undo the edit suggests they were able to reproduce the problem. I have not attempted to reproduce the problem with another file (upload a new version for another image), to avoid disruption.

Event Timeline

Aklapper renamed this task from Image thumbnails not working, dimensions shown as 0x0 pixels to Image thumbnails for specific PNG file not working, dimensions shown as 0x0 pixels.Jul 7 2021, 1:58 PM
Aklapper edited projects, added Thumbor; removed Commons.

FYI: According to a Comment from @TilmannR on their talkpage it is possible to search for such 0x0-PNGs using advantage search.

Currently 27 PNG-files seems to be affected.

@JoKalliauer There are 31 now.
I am unfamiliar with this report process. What is supposed to happen next?
I looked at the list of other open tasks, and some say "bug report". Shouldn't this one, too?

tl;dr the image metadata is partially corrupt, which means MediaWiki couldn't read the image size after a recent code change. Current version of this file fixed, likely other affected files.


Not a Thumbor problem: https://upload.wikimedia.org/wikipedia/commons/thumb/c/cd/Revierderby.png/220px-Revierderby.png

@JoKalliauer There are 31 now.
I am unfamiliar with this report process. What is supposed to happen next?
I looked at the list of other open tasks, and some say "bug report". Shouldn't this one, too?

It only says "bug report" if you use the specific "bug report" form to file the task, and that marker doesn't actually do anything. That form has some additional structure to make sure that enough information is included, but you don't need to use it.

$ identify Revierderby.png                                                                            
Revierderby.png PNG 1708x685 1708x685+0+0 8-bit sRGB 256c 110278B 0.000u 0:00.003
identify: tEXt: CRC error `Revierderby.png' @ warning/png.c/MagickPNGWarningHandler/1748.

The image metadata appears to be corrupted, which causes an error in rMW includes/media/PNGMetadataExtractor.php. MediaWiki is unable to read the entire metadata and determine the correct image size so the image size is set to 0x0. Thumbor is not affected as it uses exiftool to determine the original image size, which just ignores the bad tEXt block.

The CRC value is used to detect errors caused by corruption. Sometimes CRC errors occur because the original values weren't calculated correctly, while in other cases errors are caused by actual corruption. I used libpng's pngfix tool to recalculate the CRC values.

$ pngfix -e -w Revierderby.png  
Revierderby.png: tEXt: bad CRC
IDAT OK  maximum 15 15 109388 1170665 Revierderby.png
$ pngfix --out=Revierderby-fix.png Revierderby.png
IDAT OK  maximum 15 15 109388 1170665 Revierderby.png
$ pngfix -e -w Revierderby-fix.png
IDAT OK  maximum 15 15 109388 1170665 Revierderby-fix.png

This appeared to fix the error, as identify Revierderby-fix.png and pngfix were able to read the file without warnings or errors. However, when I uploaded the new version to Commons, the file was still broken. This means the tEXt block itself was corrupted at some point.

I then used pngfix again to strip out the corrupted tEXt block.

$ pngfix --strip=crc --out=Revierderby-strip.png Revierderby.png  
IDAT OK  maximum 15 15 109388 1170665 Revierderby.png

This version worked as expected when uploaded.

Reverting to a previous version did not work as the previous versions all have the same tEXt CRC error. The previous versions already had the image size extracted and stored in the database, which is why they aren't 0x0. Most image editing programs just copy metadata from previous versions, which is why corrupt tEXt block has propagated.

The only recent change to image size/metadata handling was rMWb4849e03b700: Use the unserialized form of image metadata internally. Before that change, the image size was usually handled separately from the metadata, which meant that the size could be obtained from files with corrupt metadata. @tstarling may have ideas about how to improve reliability here.

AntiCompositeNumber renamed this task from Image thumbnails for specific PNG file not working, dimensions shown as 0x0 pixels to Image size is not determined for new PNG files with (partially) corrupt metadata.Jul 11 2021, 3:25 AM

Thank you, @AntiCompositeNumber ! Would the error happen again if I tried to upload the new version of File:Revierderby.png again (you basically reverted it to the old version)?

Yes, I only fixed that version, but have now uploaded a fixed version of your version. None of the versions before my two most recent versions will work, but new versions derived from either of those versions should.

Your image editor ended up doing the same thing I tried first, which is blindly recalculating the CRC values. That means the corrupted tEXt block is still there, preventing MediaWiki from being able to read the metadata. However, because the CRC value is now correct for the corrupt data, I can't easily fix the file by removing all blocks with invalid CRC values. Instead, I had to remove all unused blocks, which also worked.

$ pngfix --strip=unused --out=20210706112305!Revierderby-strip.png 20210706112305!Revierderby.png     
IDAT OK  stdfast 15 15 266925 4853225 20210706112305!Revierderby.png

MW doesn't actually check the CRC values. Its problem is that each tEXt chunk is supposed to contain a keyword and value, separated by a null byte, but the chunk in question only contains 25 null bytes. The solution is to be less picky about the validity of tEXt chunks. I am testing a patch which does this.

Given the new use of PNGMetadataExtractor as a source of image sizes, there are really very few good reasons for throwing an exception (which is later caught and discarded). It needs to just keep reading chunks no matter what.

Change 704197 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Ignore invalid chunks in PNG files, instead of aborting metadata extraction

https://gerrit.wikimedia.org/r/704197

Change 704197 merged by jenkins-bot:

[mediawiki/core@master] Ignore invalid chunks in PNG files, instead of aborting metadata extraction

https://gerrit.wikimedia.org/r/704197

There are now 85 files affected by this bug, according to this search. What is going to happen to those?

These all look like broken files. I leave it to the discretion of commons admins but I think they fall under CSD#F7

No they're not broken files. I'm fixing them. Updates in an hour or so.

Mentioned in SAL (#wikimedia-operations) [2021-07-28T02:43:55Z] <TimStarling> on mwmaint2002 fixing T286273 broken files using eval.php

No they're not broken files. I'm fixing them. Updates in an hour or so.

Yes. I was about to comment and say "I take that back".

I did the following in eval.php:

$dbr = wfGetDB( DB_REPLICA);
$res = $dbr->query( 'SELECT img_name FROM image WHERE img_width=0 AND img_height=0 AND img_major_mime=\'image\' AND img_minor_mime=\'png\'', 'eval.php' );
$repo = MediaWiki\MediaWikiServices::getInstance()->getRepoGroup()->getLocalRepo();
foreach ( $res as $row ) { $file = $repo->findFile($row->img_name); if ( $file ) $file->upgradeRow(); }

That should have fixed 37 of the 83 files returned by the query.

I captured the exception messages from PNGMetadataExtractor. 44 failed with "Chunk size of <number> too big". There will be a patch for that.

One (Teganexd.png) failed with "unexpected end of file". It's probably not fixable.

The one other odd case is Smoll.png which returns 0 for the width without failing.

Change 708393 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] PNGMetadataExtractor: skip oversize chunks instead of aborting

https://gerrit.wikimedia.org/r/708393

The search index still shows the fixed files (with correct thumbnails). In case it doesn't fix itself, the list of affected files is at P16919.

Change 708393 merged by jenkins-bot:

[mediawiki/core@master] PNGMetadataExtractor: skip oversize chunks instead of aborting

https://gerrit.wikimedia.org/r/708393

So the latest revision of https://commons.wikimedia.org/wiki/File:In-fa_2021_track.png is also affected due to surpassing 3,145,728 bytes.

Change 708850 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@wmf/1.37.0-wmf.16] PNGMetadataExtractor: skip oversize chunks instead of aborting

https://gerrit.wikimedia.org/r/708850

Change 708850 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.16] PNGMetadataExtractor: skip oversize chunks instead of aborting

https://gerrit.wikimedia.org/r/708850

Mentioned in SAL (#wikimedia-operations) [2021-07-30T04:56:05Z] <tstarling@deploy1002> Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails T286273 (duration: 00m 57s)

Mentioned in SAL (#wikimedia-operations) [2021-07-30T05:25:19Z] <tstarling@deploy1002> Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails T286273 (duration: 00m 57s)

I ran the eval.php snippet again. All current versions of files on commons appear to be fixed except Smoll.png and Teganexd.png.

I nominated Smoll.png and Teganexd.png for speedy deletion, under F7. These two files, both by the same user, look genuinely corrupt (other than the rest of the images that were affected).

I nominated Smoll.png and Teganexd.png for speedy deletion, under F7. These two files, both by the same user, look genuinely corrupt (other than the rest of the images that were affected).

20 bytes and 8 bytes respectively, so they have already been broken images.

I also wonder if the issue is resolved, or simply the affected images are resolved.

I also wonder if the issue is resolved, or simply the affected images are resolved.

New uploads with the same issue should also be fixed, so affected images in new uploads should be very rare. The other thing we could consider is rejecting 0x0 images on upload. But assuming 0x0 images are rare to nonexistent, I'm leaning towards just calling the task resolved.

The other thing we could consider is rejecting 0x0 images on upload. But assuming 0x0 images are rare to nonexistent, I'm leaning towards just calling the task resolved.

I've had a few, usually TIFF iirc. Tracking 0x0 images would probably be a good metric if any further changes to metadata parsers are planned.

I'd like it because it would stop people from reporting metadata parser issues as thumbnailer issues :)

It's resolved as far as I am concerned. Thanks!

tstarling claimed this task.

OK, I'll resolve this, and if any other features to do with broken images are desired, such as tracking or rejecting, they can be filed as separate tasks.