Page MenuHomePhabricator

Exif values retrieved incorrectly if they appear before IFD
Closed, ResolvedPublic

Event Timeline

McZusatz raised the priority of this task from to Needs Triage.
McZusatz updated the task description. (Show Details)
McZusatz subscribed.

For https://commons.wikimedia.org/wiki/File:Sarcophagus_of_Louise_of_Great_Brittain,_Roskilde_Cathedral,_Denmark,_2015-03-31-4813.jpg (and presumably the others although I haven't investigated them), it appears to be a bug in the php exif library. But we should probably narrow it down and get a minimal test case before reporting upstream.

I spent some time debugging this.

I believe the issue occurs when the pointers in the Exif IFD point to locations earlier in the file in the IFD. When that happens, php's exif library seeks directly to that pointer instead of seeking relative to the start of the APP1 segment in the jpeg file. Thus everything is off by the difference between the two starting points (In Sarcophagus_of_Louise_of_Great_Brittain,_Roskilde_Cathedral,_Denmark,_2015-03-31-4813.jpg that's about 12 bytes, so you get tags mismatching with their values)

See exif_process_IFD_TAG in ext/exif/exif.c (in zend php)

Bawolff renamed this task from Wrong metadata is displayed. to Exif values retrieved incorrectly if they appear before IFD.Apr 27 2015, 8:21 AM
Bawolff added a project: Upstream.
Bawolff set Security to None.

And what's the upstream bug report?

I think every "upstream" issue should have an upstream bug report, and mention it here, too.

Perhaps https://bugs.php.net/bug.php?id=50845 opened in 2010?

I think every "upstream" issue should have an upstream bug report, and mention it here, too.

+1

Perhaps https://bugs.php.net/bug.php?id=50845 opened in 2010?

Why took it 5 years for the bug to reach us? Was there an update of our php to an affected version in the last 30 to 60 days?

Why took it 5 years for the bug to reach us? Was there an update of our php to an affected version in the last 30 to 60 days?

@McZusatz, another option is that files with this format have become more common, because some device or application that produces these files has become more popular.

New, here, so sorry in advance for not knowing the circuitry :-)

Any idea when this bug will be fixed and deployed on Commons?

Could the priority be escalated?

Currently as more and more people are uploading photos generated using the extremely popular Lightroom 6.x the problem is escalating and new users are affected.
https://commons.wikimedia.org/wiki/Commons:Featured_picture_candidates/File:141227_Berliner_Dom.jpg
I have tried to summarize some methods I have found for mitigating the issue here:
https://commons.wikimedia.org/wiki/User:Slaunger/Mitigating_Mediawiki_Metadata_Viewer_Bug

A question:

When the bug is fixed, will the metadata shown on the file pages of those files, which currently display corrupted data automatically be fixed?

Aklapper triaged this task as Medium priority.Jul 16 2015, 2:05 PM

Is anybody from Multimedia investigating this (or does this actually need more investigation on the Wikimedia side, like a minimal textcase)?
However if this is really https://bugs.php.net/bug.php?id=50845 then an upstream patch is required, not much to do for Wikimedia itself...

I'd like to add my voice to those requesting this fix ASAP. We've had yet another post to the Village Pump from users wondering why their EXIF data is displayed incorrectly. This wastes people's time investigating if there is a problem with their own image software.

I'd also like to know, if the problem is fixed, whether existing images would have their EXIF reported correctly. Or do you have to refresh the data held by MediaWiki? If the latter, then some search on images created by Lightroom 6 (and possibly Photoshop CC 2015) would target those in need of refresh.

I'd like to add my voice to those requesting this fix ASAP.

Well, as this seems to be a problem in PHP, anybody can and should provide a patch to PHP. https://bugs.php.net/bug.php?id=50660 and https://bugs.php.net/bug.php?id=50845 have been mentioned in this task and that's where this problem can and should get solved.

@Smalyshev merged the PHP patch (http://git.php.net/?p=php-src.git;a=commit;h=1ab5a1b432a4b4c62171864bd1b545616e1b07db). This fixes the most common problem mentioned here, bogus values being read when the data is before the IFD structure (https://bugs.php.net/bug.php?id=50845). It will be a part of PHP 5.6.24.

I have no idea what it would take to have it deployed in Wikimedia production today, especially since we actually use HHVM. Porting it should be straightforward, @Smalyshev pointed to https://github.com/facebook/hhvm/blob/ea6ff01f6c31f1615a935ef96622d623a6277d37/hphp/runtime/ext/gd/ext_gd.cpp#L6584.

HHVM patch is also merged (https://reviews.facebook.net/rHHVM255373a80a9b9c8b1b452f902e394cd9773729cd). Now, I wonder what I need to do to get it to our servers.

This was never a MediaWiki bug, but rather an upstream issue with PHP and HHVM (https://bugs.php.net/bug.php?id=50845). I fixed it with the following patches:

The patches should be included in the following releases:

  • PHP 5.6.24, 7.0.9 and 7.1.0
  • HHVM 3.15.0

If you experience this problem on a non-Wikimedia wiki, its version of PHP or HHVM must be upgraded to one of the above (or later).

For deployment of these fixes on Wikimedia wikis, let's continue at T140419.

(The patch was also backported to HHVM 3.12.8.)