Page MenuHomePhabricator

Include at least some EXIF metadata in resized pictures
Closed, ResolvedPublic

Description

Author: folengo

Description:
A discussion took place on Wikimedia Commons' Village Pump on 19 May 2009 on how to best respond to the pressure from photographers wanting their names to be credited on article pages.

It was suggested that instead of (or in addition to) crediting photographers on article pages, we should make our best efforts to keep the copyright EXIF metadata when available.

It was understood that the underlying reason for excluding EXIF data from resized pictures and thumbnails until now, was the concern that sometimes cameras add an overwhelmingly heavy amount of EXIF metadata.

Our conclusion is that some sort of compromise has to be reached between these two concerns, by including at least some EXIF metatada, if not all of them.

A) Perhaps, really small thumbnails like those used in categories, (that means 120px or smaller) might be allowed to remain void of metadata, while larger thumbnails or resized pictures would compulsorily include at least the most useful metadata.

B) The most useful metadata which should be included in most resized pictures and thumbnails should be :
*ImageDescription,
*Copyright,
*DateTimeOriginal,
*DateTime,
*GPSLatitudeRef,
*GPSLatitude,
*GPSLongitudeRef,
*GPSLongitude,
*GPSAltitudeRef,
*GPSAltitude,
*IPTC:Credit,
*IPTC:CopyrightNotice, and
*IPTC:Byline.

C) The message at the bottom of description pages (metadata-help) should be reworded with a less ambiguous wording. When a user reads "This file contains additional information (...) ", he remains clueless on whether that means the original file, or the resized 800px preview present on the description page, or both.

See also:

Revisions and Commits

rTHMBREXT Thumbor Plugins
Restricted Differential Revision

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

PHP's exif support doesn't appear to include writing metadata, though, so we might need to add new code to copy the info or find a way to do it via ImageMagick.

Just yesterday I used Imagemagick to resize a bunch of files. Having read this thread before I was quite surprised that all the meta data were still present. I don't really understand what's the problem here? FTR, I used something like

$ convert file.jpg -resize 800x600 thumb.jpg

We use an option to image magick to strip all metadata (except colour profiles). In sone images the metadata can be larger than the thumb itself.

Addendum:

Issue specificly being that image magick doesnt offer a lot of control over which fields. Although perhaps it offers enough to fix the meat of this bug. (True issue like in most dev things, is somebody just has to take the time and go be bold and fix the bug)

We use an option to image magick to strip all metadata (except colour profiles). In sone images the metadata can be larger than the thumb itself.

Don't remove all metadata. Removing copyright related informations is a criminal act in some countries.

The removal of copyright and licensing information when resizing pictures must surely violate:

  1. The CC-BY-SA licenses, which are recommended for use by wikipedia
  2. The U.S. Digital Millennium Copyright Act, discussed somewhat at commons.wikimedia.org

I beg to raise the priority to 'Unbreak now' to avoid a legal lynching!

Argument by assertion isnt very much of an argument. However, If you feel that this bug is causing us to actively violate copyright law or a cc license, I suggest you discuss the situation with the wmf legal department. If and when legal says that this is a violation of the licenses it will be treated as unbreak now. Until then, its considered a really nice feature to have but ultimately low priority.

I put the issue of copyright and licensing information removal to wmf legal as suggested by Bawolf (thanks for the suggestion). The response was:

We don't believe that the tool [ie the resizing function] as it exists creates liability for WMF. I can't really expound on the issue because that would constitute legal advice on how to comply with the DMCA, but I can say that WMF's opinion, similar to what is outlined in the watermarks wikilegal posting is that the tool is not removing CMI in violation of the DMCA.

Reasoning would have been nice, but in its absence I suppose that 'argument by assertion' is trumped by 'argument by statement of opinion' when it comes from the legal department. FWIW, they agreed that 'it probably would be nice if the resizing tool kept metadata'.

Folengo mentioned the discussion at Flickr - last year they changed their resizing so as to keep copyright notices, as announced here.

Perhaps legal's 'would be nice' opinion might be worth bump up to 'Normal' priority?

According to this thread at the Commons Village Pump, the Community is urged that this issue should be corrected as soon as possible. T111722 is another recent issue that request the preserving, at least, the Copyright information in Exif metadata.

Various common licences state that you must include any copyright notices when using a work.

GFDL 1.3 states:

You may copy and distribute the Document [...] provided that [...] the copyright notices, [...] are reproduced in all copies, [...].

CC-BY-SA 3.0 states:

If You Distribute, [...] the Work [...], You must, [...], keep intact all copyright notices for the Work [...].

If Mediawiki removes EXIF, Mediawiki complicates things for users who wish to use verbatim copies of files on the Internet, as shown in the thread at COM:VPC. It is therefore a very good idea to provide EXIF in all copies distributed through the website.

If Mediawiki removes EXIF, Mediawiki complicates things for users who wish to use verbatim copies of files on the Interne

[IANAL] Regardless of Exif, people who distribute images verbatim without any external notice are probably violating the license

As someone who is commonly on a low-bandwidth connection, I would prefer that our thumbnails include as little EXIF data as possible. The copyright fields are fine, but I see no compelling reason for including things like ImageDescription, GPS fields, etc. (as suggested in the task description).

As someone who is commonly on a low-bandwidth connection, I would prefer that our thumbnails include as little EXIF data as possible. The copyright fields are fine, but I see no compelling reason for including things like ImageDescription, GPS fields, etc. (as suggested in the task description).

Is that really a factor? My intuition would be that the size of a few lines of EXIF data is insignificant compared to the size of an image, even a thumbnail. We should probably test a representative sample of images and see if the increase in size due to EXIF metada is noticeable.

Especially the image description, but even the author field, is known to at times include large blobs of text or even html. I believe that was why we started removing it in the first place.

Some other reasons were probably: unfilled fields that had been added by the camera, colorprofiles that were already applied to the resized image (and thus no longer correct) and large amounts of padding (so you can just edit the EXIF, without moving the picture bytes).

Hmmm, so I just discovered the related T111722, from there d0295e039d88 seems to have been related. I was initially confused by b768122dd7de (was this a revert? Seems not.) ; but checking a Thumb of one of my own uploads in Gimp, I can see 9 EXIF fields, including Artist and Copyright.

So, is this essentially fixed ? :D

All filetypes ?

@Gilles can you capture the current status and if proper, close this ticket ?

I don't believe that GIF is capable of supporting that kind of metadata. The EXIF filtering capabilities only apply to JPG at the moment. Behavior described here https://wikitech.wikimedia.org/wiki/Thumbor/JPEG

Production settings set here: https://github.com/wikimedia/puppet/blob/05547c4d7fddc3d67c9bc437f7a64b9a82ab3957/modules/thumbor/templates/server.conf.erb#L95

Test coverage: https://github.com/wikimedia/operations-debs-python-thumbor-wikimedia/blob/47fea15502b5248a76201adebe067cfbaf16f35e/tests/integration/test_exif.py#L92-L103

In addition, the ICC profile is preserved (or converted to tinyRGB in the case of sRGB).

That leaves PNG. Which in my experience is more hit-and-miss when it comes to processing metadata, because it can be stored in a PNG in many ways and it's not very standardized.

Thanks @Gilles for the update. It sounds to me that this ticket is essentially resolved: *some* EXIF metadata is included (and the mechanism to add more is clear) ; and I would think that most people wanted this primarily for photographs.

From here, if additional fields need to be whitelisted (I think the copyright-related IPTC fields would be welcome additions), then we can file a dedicated ticket for them.

I also like to thank you all. I'm looking forward then to add more photographs to Wikimedia.

I can confirm that Exif.Image.Artist and Exif.Image.Copyright are preserved in resized jpgs. Excellent, thanks!

Ticket d0295e039d88 suggests that Exif fields Description and icc_profile should be preserved too: I don't see Exif.Image.ImageDescription being preserved. icc_profile however I don't think I can currently check for (at least by using exiv2).

I see that Description isn't covered by the tests, it's possible that it's using the wrong key name and should be ImageDescription instead, I'll look into that.

@Batternut can you point me to a file on Commons that has Exif.Image.ImageDescription defined? There isn't any in the current test files in our Thumbor plugins.

That image actually has no EXIF as far as I can tell. I'm not looking for one with any EXIF data, but specifically with the ImageDescription EXIF field set.

@Gilles An example image below. Yes, it seems that the Xmp description is being preserved (ie "Xmp.dc.description"). Perhaps Xmp is favoured over Exif now...

https://commons.wikimedia.org/wiki/File:%27The_Poppet%27_-_1950s_baby_bath.jpg

Ah, indeed, but that's probably just an accident that an XMP field has that name. The original task description mentions ImageDescription. And looking at which fields MediaWiki stores for display on the file page, only ImageDescription is there, not Description. This suggests that ImageDescription is the more common one.

Thanks for the reference file, I'll create a test case and switch the filtering from XMP Description to EXIF ImageDescription.

Gilles added a revision: Restricted Differential Revision.Aug 31 2018, 8:41 AM

Change 456575 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/puppet@production] Preserve EXIF ImageDescription instead of XMP Description

https://gerrit.wikimedia.org/r/456575

Change 456577 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/vagrant@master] Preserve EXIF ImageDescription instead of XMP Description

https://gerrit.wikimedia.org/r/456577

Change 456577 merged by jenkins-bot:
[mediawiki/vagrant@master] Preserve EXIF ImageDescription instead of XMP Description

https://gerrit.wikimedia.org/r/456577

Does 'merged' mean the change is now live, or is there a further release process?

Change 456575 merged by Filippo Giunchedi:
[operations/puppet@production] Preserve EXIF ImageDescription instead of XMP Description

https://gerrit.wikimedia.org/r/456575

Stashbot added a subscriber: Stashbot.

Mentioned in SAL (#wikimedia-operations) [2018-09-03T07:55:46Z] <godog> roll restart thumbor to apply latest config changes - T203135 T20871

@Batternut It's merged now, newly generated thumbnails should retain ImageDescription instead of Description. You can purge existing files to trigger re-generation of thumbnails (make sure to clear your browser cache to get the new thumbnails after the purge).

Yep. Having checked the 320px thumbnail of the baby bath image previously mentioned, I can confirm that Exif.Image.ImageDescription is now being preserved, and Exif.Image.Copyright and Exif.Image.Artist are still being preserved, ie no regression.

👍 for me! Thanks.

Change 461793 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/debs/python-thumbor-wikimedia@master] Upgrade to 2.2

https://gerrit.wikimedia.org/r/461793

Change 461793 merged by Filippo Giunchedi:
[operations/debs/python-thumbor-wikimedia@master] Upgrade to 2.2

https://gerrit.wikimedia.org/r/461793

Mentioned in SAL (#wikimedia-operations) [2018-09-24T09:30:06Z] <godog> upgrade / roll restart thumbor in eqiad / codfw - T20871 T198370

JeanFred claimed this task.

I’m boldly closing this as Resolved, per my comment at T20871#4311283 and the additional comments by @Gilles.