Page MenuHomePhabricator

Optimize thumbs right after creation
Open, LowPublic

Description

It's no secrect that ImageMagick creates thumbs fast but often unecessarily large.

I just optimized the thumb directory with 265K files (12 GB) with jpegoptim. It took only 10 minutes (8 core x 3 GHz) and I saved over 1 GB :) #perfmatters

Now wouldn't it be great to optimize the thumbs right after creation? The thumb is immediatly available by IM, then the optimizer runs over it and replaces it if possible.

Related: https://phabricator.wikimedia.org/T101015 (Use optimised version instead of original when original size is used as thumbnail)

Edit: pngcrush saved me ~8% files size (190 MB in 2,2 GB, these 2,2 GB png thumbs were included in the JPEG calculations above)

Event Timeline

Subfader raised the priority of this task from to Needs Triage.
Subfader updated the task description. (Show Details)
Subfader subscribed.
Restricted Application added subscribers: Steinsplitter, Aklapper. · View Herald Transcript
Subfader set Security to None.

Most of these sorts of gains by running optimizers with default settings on our thumbnails are coming from EXIF metadata that the community feels should be kept in our thumbnails. If you can still have gains with non-destructive operations that keep the same amount of EXIF, that would be interesting.

Jdforrester-WMF moved this task from Untriaged to Backlog on the Multimedia board.

pngcrush saved me ~8% files size (190 MB in 2,2 GB)

I'm assigning this to myself to investigate where the gains are being had. I.e. to verify that it's not just metadata we want to keep being stripped and that it's actual image data optimization.

ImageOptim, which tries all the best image optimization tools until it finds the best performing one for a particular image, conveniently has an option to leave metadata untouched. I tried it on a much smaller sample (20 images each) because it had to be manual, but here are the results:

JPGs ended up 2.7% smaller on average. Something was strange regarding the tool options, though, as savings were only achieved when selecting all 3 tools together (JPEGOptim, jpegtran and jpegrescan). More investigation is required.
PNGs ended up 17.2% smaller on average with PNGCrush alone (ImageOptim supports several tools that can be tried in parallel, but it makes processing very slow and only gives an extra 2% on this test sample).

This confirms your findings, and speed-wise it seemed reasonable enough that we can do it on the fly when we generate the thumbnails.

When I try to use pngcrush directly, I can't reproduce these gains anymore, it's always below 1%. I presume that ImageOptim isn't using the default pngcrush options. @Subfader, which options did you use for pngcrush?

I did find something suspicious after more examination, though, which is that unlike what it claims in its preferences, ImageOptim does strip metadata... Meaning that the gains I've experienced above could be due to that.

The best PNG compression that I know of is

optipng -q -o7 $file && advpng -z -4 $file && advdef -z -4 $file

My current strategy:

JPEG: On the fly
I force -strip in Imagemagick -thumbnail for all file types. That makes jpegoptim obsolete for just stripping with jpegoptim -o --strip-all.

PNG: Via script (Bitmap.php passes thumb paths to a file, a bash script grabs the paths...)

  1. pngquant --quality=65-80 --speed 1 --force >> Lossy compression down to ~30%. I don't care about possible false images, but didn't come across any yet either.
  2. pngcrush -brute -rem allb -reduce >> Lossless reduction, still a few % after pngquant :) Can take up to 2 minutes, depending on file size.
  3. optipng -nc -nb -o7 >> Only a few bytes reduction but no cost

Disadvantage: In Bitmap.php I failed to pass the path to my bash script right after thumb creation.
Atm the "large" PNG thumbs exist for ~5-10 minutes. Bottlenecks are the cronjob frequencies of runJobs.php ($wgUploadThumbnailRenderMap) and my bash script.

Gilles lowered the priority of this task from Medium to Low.Sep 20 2015, 7:21 AM

Thanks for posting your commands! Metadata stripping, lossy compression, no wonder you had significant size reduction :) Feel free to implement those things as options in mediawiki, but don't expect the WMF to pick up the task of implementing it, as it can't be used for Wikimedia sites for reasons described earlier. The backlog of things to improve for thumbnails in the Wikimedia context is too large for me to get distracted with implementing mediawiki features inapplicable to Wikimedia.

I'll still research this further when I have time to look for ways to do better in Wikimedia context. Recompression that takes minutes would only be worth it if the gains are large, because the load impact on the image scaling servers could be significant.

I also need to check if we aren't keeping too much metadata rather than just the essentials the community wants. That's probably what could make the most difference (fine-grained metadata filtering if it's keeping it all right now).

For the record: Previous result data was only based on jpegoptim -o --strip-all and pngcrush -brute -rem allb -reduce.

People at commons say that zopfli gets better results then pngcrush (But is slow enough you might not want to do it during thumbnail generation, but in some async fashion)

2014-10-17_-_DGTL_pres._Kompakt,_ADE.png (315×851 px, 202 KB)
>>! In T111633#1698482

@Bawolff wrote:

People at commons say that zopfli gets better results then pngcrush (But is slow enough you might not want to do it during thumbnail generation, but in some async fashion)

How that? Random PNG file attached (206.866 bytes).

206.866 PNG
202.377 PNG > zopfli --i200
154.669 PNG > pngcrush -brute -rem allb -reduce
153.810 PNG > pngcrush > zopfli

For the record: The biggest performance boost is to additionally create webp thumbs :)

@Subfader: There is no need to advertise Webp support across any image-related tasks such as T111633#2262716 or T134102#2254842. Thanks for keeping tasks on-topic.