Page MenuHomePhabricator

Large TIFF files do not pass file verification (related to version of image magick installed)
Closed, ResolvedPublic

Description

I have several large tiff files that generate "This file did not pass file verification" errors when I try to upload them to Commons, both in the Upload Wizard and via the API (pattypan).

Example (320 MB)

@Bawolff figured out that imagemagick is running out of memory when reading the metadata. I have uploaded a bunch of smaller files from the same batch of scans (same metadata structure) successfully; the file verification error consistently appears for files larger than ca 200 MB.

Event Timeline

Specificly,

identify-im6.q16: cache resources exhausted `Anna Norrie, rollporträtt - SMV - NN054.tif’ @ error/cache.c/OpenPixelCache/4083.

it only seems to happen when requesting the depth, alpha and alpha2 fields. These fields are only needed to know if to use png for alpha channel, afaict. I think perhaps it makes sense to, if first fail, just assume the tiff has no alpha channel, as alpha channel in tiff files are pretty rare.

As a small experiment I used ImageMagick to remove the alpha channel

convert norrie.tiff -alpha off output.tiff

on the example file. Uploading using the wizard gave

Unknown error: "$1".

So I'm not sure that preprocessing the batch with something like ImageMagick is an easy fix.

As a small experiment I used ImageMagick to remove the alpha channel

convert norrie.tiff -alpha off output.tiff

on the example file. Uploading using the wizard gave

Unknown error: "$1".

So I'm not sure that preprocessing the batch with something like ImageMagick is an easy fix.

I dont think the issue was that the file had an alpha channel, its an issue with the code that detects alpha channel (that will run regardless of if the alpha channel is present). That said i dont see anything particularly special about this image that would cause it to fail where other large images succede, so i dont know.

btw, for reference, at least locally, the command that is timing out is:

identify -format '[BEGIN]page=%p\nalpha=%A\nalpha2=%r\nheight=%h\nwidth=%w\ndepth=%z[END]' 'Anna Norrie, rollporträtt - SMV - NN054.tif' 2>&1

I only tested locally, so can't guarantee that's what the problem is on the server side, but it seems likely if it doesn't work locally it won't on server either. However if i remove the %A %r and %z the command executes fine.

Change 557089 had a related patch set uploaded (by Brian Wolff; owner: Brian Wolff):
[mediawiki/extensions/PagedTiffHandler@master] Allow tiff files to be uploaded that identify can't find alpha

https://gerrit.wikimedia.org/r/557089

It works on my mac, btw:

Version: ImageMagick 7.0.8-68 Q16 x86_64 2019-10-07 https://imagemagick.org

identify: Incompatible type for "RichTIFFIPTC"; tag ignored. `TIFFFetchNormalTag' @ warning/tiff.c/TIFFWarnings/1017.
[BEGIN]page=0
alpha=Undefined
alpha2=DirectClass Gray 
height=14567
width=11742
depth=16[END]

Hmm, I'm using the version from debian buster

Version: ImageMagick 6.9.10-23 Q16 x86_64 20190101 https://imagemagick.org
Copyright: © 1999-2019 ImageMagick Studio LLC
License: https://imagemagick.org/script/license.php
Features: Cipher DPC Modules OpenMP 
Delegates (built-in): bzlib djvu fftw fontconfig freetype heic jbig jng jp2 jpeg lcms lqr ltdl lzma openexr pangocairo png tiff webp wmf x xml zlib

I can also confirm if i compile from (source Version: ImageMagick 7.0.9-8 Q16 x86_64 2019-12-16) that identify works properly on this tiff file.

So i guess either we go with the MW work around, or get operations to use a newer version of image magick.

Bawolff renamed this task from Large TIFF files do not pass file verification to Large TIFF files do not pass file verification (related to version of image magick installed).Dec 16 2019, 4:25 PM

CC @MoritzMuehlenhoff (Re: get operations to use a newer version of image magick) as in the past he helped solving package versions on mw servers, and at least he could give an informed opinion here, but please feel free to reject if busy and will search someone else to help.

Providing Imagemagick 7 is non-trivial given that Debian hasn't migrated to 7.x yet, Bullseye will most certainly have it, but for now my suggestion is a workaround in mediawiki.

@Bawolff Is the answer clarifying enough? Aiming for Bullseye (a few years in the future), patch on mw for now.

So what bawolff quotes:

identify-im6.q16: cache resources exhausted

Maybe it's just that Debian has adjusted the memory limit policies between versions ?
We should compare the memory and disk limits in /etc/ImageMagick-6/policy.xml

identify likely uses 'local' policies of the clusters, whereas thumbnailing uses thumbor.. and is adapted to handle images of very high sizes. Might be that that throws some things off..
identify -list resource should be able to show us the limits that are in place.

Honestly identify is a terrible way to read this metadata info.. but apparently it was the only reliable method back when this was implemented.

Interestingly enough, tiffinfo can be used in place of identify.. I wonder if we used to use tiffinfo command on older servers and we accidentally switched to imagemagick...

[edit] Seems tiffinfo was a launch criterium for PagedTiffHandler way back: T25258: Enable PagedTiffHandler on all wikis, to allow display of TIFF files
I think that's what is going on. We accidentally instead of intentionally switched to imagemagick from tiffinfo support... In fixing, we might have to take into account T172584: Securing external binaries run by MediaWiki.

It was a change by @Reedy "Remove $wgTiffUseTiffinfo because it doesn't exist".. but I think it does..

Change 560521 had a related patch set uploaded (by Reedy; owner: Reedy):
[operations/mediawiki-config@master] Revert "Remove $wgTiffUseTiffinfo because it doesn't exist"

https://gerrit.wikimedia.org/r/560521

Change 560521 merged by jenkins-bot:
[operations/mediawiki-config@master] Revert "Remove $wgTiffUseTiffinfo because it doesn't exist"

https://gerrit.wikimedia.org/r/560521

Mentioned in SAL (#wikimedia-operations) [2019-12-24T11:43:45Z] <reedy@deploy1001> Synchronized wmf-config/CommonSettings.php: use TiffInfo again T240455 (duration: 01m 07s)

Can someone try the large tiff(s) again? :)

Maybe it's just that Debian has adjusted the memory limit policies between versions ?
We should compare the memory and disk limits in /etc/ImageMagick-6/policy.xml

Btw i tried adjusting mem limits locally when testing, and it didnt seem to change anything, but maybe i did it wrong.

Probably not I guess. I disabled TiffInfo (accidentally) like 2 years later... But maybe OS and package upgrades in the meantime have helped

Fixed!

It works indeed, I'm uploading my horrible files right now :) Thanks to everyone who helped out!

Change 557089 abandoned by Brian Wolff:
Allow tiff files to be uploaded that identify can't find alpha

Reason:
meh, image magick is likely going to fix itself, and WMF is fine with tiffinfo

https://gerrit.wikimedia.org/r/557089