Page MenuHomePhabricator

Reduce amount of metadata embedded in thumbnails across site
Closed, ResolvedPublic

Description

Image metadata can optionally be stripped on transformations, see e.g. http://www.imagemagick.org/script/command-line-options.php#strip (other utilities such as jpegoptim are more fine-grained).

JPEG optimizers (lossless) shows we can win at least 17KB for images on an average Wikipedia page.

This comes at the cost of various pieces of meta data.

I propose we should compress image thumbnails and use a single meta data field pointing to the canonical url of the image where all information about the image can be found.

This will yield huge performance benefits for all or users viewing Wikipedia articles.

Event Timeline

Jdlrobson raised the priority of this task from to Needs Triage.
Jdlrobson updated the task description. (Show Details)
Jdlrobson added subscribers: Volker_E, Glaisher, Krenair and 4 others.

I think this proposal should be posted to the Commons village pump, because that's where the most resistance against this change is likely to come from.

Gilles renamed this task from Compress images across site to Reduce amount of metadata embedded in thumbnails across site.May 27 2015, 7:39 PM
Gilles added a project: Commons.

An alternative would be to extract the latest author/license information with CommonsMetadata when generating thumbnails and replace the original exif by minimal exif based on that information. This would ensure crediting right in the file while keeping the exif as small as possible. That's if linking to the commons page is insufficient.

I think this proposal should be posted to the Commons village pump, because that's where the most resistance against this change is likely to come from.

Yes, see T20871: Include at least some EXIF metadata in resized pictures and its discussion.

We already remove most metadata from resized images (notable exception being colour profiles which are usually needed, but could in theory be minimized significantly. Cf what facebook does). Can you give a break down of what further metadata you wish to remove. Or are you referring to non-shrunk images?

Interesting, I had forgotten that Facebook had created a light version of the sRGB profile. I've extracted it from one of their images, here it is:

However, glancing at the thumbnails on the enwiki front page at the moment, none of them include a color profile. You're saying that it can happen, right?

Wandering through more thumbnails, it seems indeed pretty common for them to embed the standard sRGB ICC profile, which weighs 3kb. Beyond that it does seem like we're already only keeping essentials for metadata. Swapping rRGB for tinyRGB would save 2.5kb on those images. I'll make a subtask for that.

Related comment on Twitter: https://twitter.com/ericlaw/status/609361607152398336 ("Wikipedia could improve load time on the HTTPS site by sending less data. Remove useless image metadata"; I believe this mainly refers to PNGs and the tool described here).

Related comment on Twitter: https://twitter.com/ericlaw/status/609361607152398336 ("Wikipedia could improve load time on the HTTPS site by sending less data. Remove useless image metadata"; I believe this mainly refers to PNGs and the tool described here).

Hardly specific to HTTPS...

The tl;dr of that article, is that there is a deflate implementation named zopfli, optimized for smallest file size at the expense of CPU time (https://github.com/google/zopfli). By recompressing PNGs using that (especially static assets, where cpu time spent compressing is irrelevant) and dropping unneeded metadata chunks, you get roughly a 4% reduction in file size.

This is something we should certainly do for static assets (And I thought someone has already gone through most of those with PNGCrush which is a similar program, perhaps this program is much better). Its unclear if this is applicable to uploaded files, particularly the way our thumbnailing infrastructure works now. I'd be a lot more happy about using it on thumbnails of uploaded files if we could re-compress the most popular files in the background when the servers are not busy.

This is something we should certainly do for static assets (And I thought someone has already gone through most of those with PNGCrush

All assets already *must* be compressed with pngcrush/optipng. https://www.mediawiki.org/wiki/Manual:Assets

Stripping metadata from thumbs should be optional. #permatters

Krinkle assigned this task to Gilles.
Krinkle edited projects, added Performance-Team; removed Multimedia, Performance Issue.