Page MenuHomePhabricator

Vips uses too much memory on non-first page of large tiff files
Open, Needs TriagePublic

Description

We use vips shrink to scale the first page of the tiff file. This works sequentially through the file, and is very efficient.

However, it doesn't allow us to specify any page other than the first. So we use im_shrink for other pages. im_shrink supports adding ":<page number>" to the end of a file name in order to specify what page. im_shrink is not as "sequential" as plain shrink though, so we run out of memory on very large files (limit is 500 MB).

Its possible there is someway to do this efficiently using vips command line that I'm just not aware of.

Example file https://commons.wikimedia.org/wiki/File:UBBasel_Map_Kanton_Bern_1672_Kartenslg_Schw_Cb_2.tif?page=3

(Note, that the file has to be pretty huge before this starts to be an issue. The example file is 33,511 × 7,889 pixels at a whole 2.98 GB)


Upstream feature request: https://github.com/jcupitt/libvips/issues/345

Event Timeline

Bawolff raised the priority of this task from to Needs Triage.
Bawolff updated the task description. (Show Details)
Bawolff subscribed.
Bawolff added a project: Upstream.
Bawolff set Security to None.

Just discovered, that vips shrink supports foo.tiff[page=X] format.

Change 250308 had a related patch set uploaded (by Brian Wolff):
Use vips shrink for multi-page files

https://gerrit.wikimedia.org/r/250308

Change 250308 merged by jenkins-bot:
Use vips shrink for multi-page files

https://gerrit.wikimedia.org/r/250308

Ok, then actually the wrong task. The problem actually are still the missing thumbnails and what the Basel University Library is waiting for.

Hmm, the above patch seems not to be enough to fix https://upload.wikimedia.org/wikipedia/commons/thumb/a/a0/UBBasel_Map_1569_Kartenslg_AA_3-5.tif/lossless-page2-2057px-UBBasel_Map_1569_Kartenslg_AA_3-5.tif.png :(

Error message is usually a varnish 503, so looks like its hitting a timeout and not a memory limit.

Hmm, the above patch seems not to be enough to fix https://upload.wikimedia.org/wikipedia/commons/thumb/a/a0/UBBasel_Map_1569_Kartenslg_AA_3-5.tif/lossless-page2-2057px-UBBasel_Map_1569_Kartenslg_AA_3-5.tif.png :(

Error message is usually a varnish 503, so looks like its hitting a timeout and not a memory limit.

The issue still seems to persist:
https://commons.wikimedia.org/wiki/File:UBBasel_Map_Kanton_Bern_1672_Kartenslg_Schw_Cb_4.tif

Is anybody willing to give it another try solving the issue? - Or does the only workaround consist in splitting the multiple-page files up into several single-page files? If the latter is the case: Should these restrictions be added to the documentation in order to avoid that other people hit this bug inadvertently? Where exactly should it be added? And what exactly is the size limit?

Thumbnailing on large Multi-Page TIFF files workes fine, but there is an issue with files where each page has a different size. See example. The first page is 8.973 × 6.724 px, the second 160 x 120 px. The thumbnailer seems to use the first page's size for the second page, too. I think, this is not very important, but there may exist files where it is disturbing.

Hi @Buergerentscheid, please file separate issues about page size in a separate ticket by following https://www.mediawiki.org/wiki/How_to_report_a_bug - thanks!