Page MenuHomePhabricator

Thumbnailing for c:File:Carl_Weigert.jpg fails due to py3exiv2 handling of invalid ICC profiles
Closed, ResolvedPublicBUG REPORT

Description

Cache clearing doesn't work.
Image: https://commons.wikimedia.org/wiki/File:Carl_Weigert.jpg
Article: https://ru.wikipedia.org/wiki/%D0%92%D0%B0%D0%B9%D0%B3%D0%B5%D1%80%D1%82,_%D0%9A%D0%B0%D1%80%D0%BB
Screenshot of infobox:

{1C352DAF-F325-4023-8961-886A0BFB35C5}.png (190×575 px, 15 KB)

Reported on ruwiki forum, frwiki has the same issue, says report.

Event Timeline

Aklapper renamed this task from Image displayed correctly on Commons, but broken thumbnail in article to Numerous [[c:File:Carl_Weigert.jpg]] thumbnails are HTTP 429 Too Many Requests.Dec 5 2024, 5:54 PM

T380257 looks like similar issue.

Unrelated as librsvg is not used to thumbnail jpegs. All thumbnailing issues show up as 429s because of how Thumbor's failure throttling works, the first few requests will be a 500 then the rest will be 429s for the hour.

Request from [***] via cp1103 cp1103, Varnish XID 331549661
Upstream caches: cp1103 int
Error: 500, Internal Server Error at Fri, 06 Dec 2024 12:45:41 GMT

Testing locally:

thumbor-1  | 2024-12-06 16:40:14 thumbor:ERROR ERROR: Traceback (most recent call last):
thumbor-1  |   File "/opt/lib/venv/lib/python3.11/site-packages/thumbor/handlers/__init__.py", line 212, in get_image
thumbor-1  |     result = await self._fetch(self.context.request.image_url)
thumbor-1  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
thumbor-1  |   File "/opt/lib/venv/lib/python3.11/site-packages/thumbor/handlers/__init__.py", line 885, in _fetch
thumbor-1  |     raise fetch_result.exception
thumbor-1  |   File "/opt/lib/venv/lib/python3.11/site-packages/thumbor/handlers/__init__.py", line 853, in _fetch
thumbor-1  |     self.context.request.engine.load(fetch_result.buffer, extension)
thumbor-1  |   File "/srv/service/wikimedia_thumbor/engine/proxy/proxy.py", line 125, in load
thumbor-1  |     self.lcl[enginename].load(buffer, extension)
thumbor-1  |   File "/opt/lib/venv/lib/python3.11/site-packages/thumbor/engines/__init__.py", line 199, in load
thumbor-1  |     image_or_frames = self.create_image(buffer)
thumbor-1  |                       ^^^^^^^^^^^^^^^^^^^^^^^^^
thumbor-1  |   File "/srv/service/wikimedia_thumbor/engine/imagemagick/imagemagick.py", line 77, in create_image
thumbor-1  |     self.read_exif(temp_file)
thumbor-1  |   File "/srv/service/wikimedia_thumbor/engine/imagemagick/imagemagick.py", line 139, in read_exif
thumbor-1  |     metadata.read()
thumbor-1  |   File "/opt/lib/venv/lib/python3.11/site-packages/pyexiv2/metadata.py", line 117, in read
thumbor-1  |     self.__image._readMetadata()
thumbor-1  | ValueError: Not a valid ICC Profile
thumbor-1  | 
thumbor-1  | 2024-12-06 16:40:14 thumbor:ERROR [BaseHandler] get_image failed for url `https%3A//upload.wikimedia.org/wikipedia/commons/4/46/Carl_Weigert.jpg`. error: `Not a valid ICC Profile`
thumbor-1  | 2024-12-06 16:40:14 tornado.access:ERROR 500 GET /thumbor/unsafe/804x/https://upload.wikimedia.org/wikipedia/commons/4/46/Carl_Weigert.jpg (172.18.0.1) 376.23ms

Thanks @AntiCompositeNumber, I'm seeing the same. I suspect our ICC profile handling has become a bit stricter since we upgraded at various points libraries in T355020 and T336881, as there are older thumbnails of this image sitting in swift that were rendered by earlier versions of thumbor.

We should probably see if we can make pyexiv a bit more tolerant of bad files like this, as convert itself will happily handle images of this type with only minor complaint:

convert-im6.q16: profile 'icc': A357Bh: length does not match profile `Carl_Weigert.png' @ warning/png.c/MagickPNGWarningHandler/1668.

The error handling in rTHMBREXT wikimedia_thumbor/engine/imagemagick/imagemagick.py:126 was mostly written for pyexiv2, so it's not surprising that py3exiv2 has evolved since then, especially as exiv2 has worked on their ICC implementation. exiv2 is generally more picky about things than exiftool or imagemagick, but we only use it to read the Orientation EXIF value which isn't required to thumbnail the image.
We currently ignore a few specific known errors, raising on unknown errors. We could add ValueError to the list of ignored exceptions, or we could decide that all errors from py3exiv2 aren't really that important and except Exception instead. Looking at https://bazaar.launchpad.net/~vincent-vandevyvre/py3exiv2/trunk/view/head:/py3exiv2/src/exiv2wrapper.cpp#L1213 py3exiv2 can raise a few other exceptions we don't currently catch. Some of these exceptions are for things that exiftool and/or imagemagick also care about, but I think it's probably better to ignore them all.

Relatedly, probably should figure out if the logging problems worked around in T178072: Thumbor: Error reading image metadata: Failed to read image data affect py3exiv2. I somewhat expect they don't.

Change #1101154 had a related patch set uploaded (by AntiCompositeNumber; author: AntiCompositeNumber):

[operations/software/thumbor-plugins@master] imagemagick: ignore all py3exiv2 exceptions

https://gerrit.wikimedia.org/r/1101154

AntiCompositeNumber renamed this task from Numerous [[c:File:Carl_Weigert.jpg]] thumbnails are HTTP 429 Too Many Requests to Thumbnailing for c:File:Carl_Weigert.jpg fails due to py3exiv2 handling of invalid ICC profiles.Dec 7 2024, 4:40 PM
AntiCompositeNumber claimed this task.

I can at least demonstrate a problem with the image mentioned in T392435

`$ exiv2 Rail*.jpg
Exiv2 exception in print action for file Rail_Bridge_(Humphery)_Gayndah_(2002).jpg:
Not a valid ICC Profile`

This would allow a clunky workaround of fixing the ICC profile in a particularly desired image to eviv2's satisfaction and updating the image in Commons.

Workaround applied at https://commons.wikimedia.org/wiki/File:Rail_Bridge_(Humphery)_Gayndah_(2002).jpg which now displays thumbnails in en wikipedia and commons happily

Identified profile as IEC61966-2-1, downloaded from color.org and applied using exiftool

exiftool "-ICC_Profile<=sRGB_IEC61966-2-1_no_black_scaling.icc" railfix.jpg

And comparing exiftool output

18c18,53
< Warning                         : Bad length ICC_Profile (length 741679656)
---
> Profile CMM Type                : 
> Profile Version                 : 2.0.0
> Profile Class                   : Display Device Profile
> Color Space Data                : RGB
> Profile Connection Space        : XYZ
> Profile Date Time               : 2009:03:27 21:37:45
> Profile File Signature          : acsp
> Primary Platform                : Unknown ()
> CMM Flags                       : Not Embedded, Independent
> Device Manufacturer             : 
> Device Model                    : 
> Device Attributes               : Reflective, Glossy, Positive, Color
> Rendering Intent                : Media-Relative Colorimetric
> Connection Space Illuminant     : 0.9642 1 0.82491
> Profile Creator                 : 
> Profile ID                      : c95bd637e95d8a3b0df38f99c1320389
> Profile Description             : sRGB IEC61966-2-1 no black scaling
> Blue Matrix Column              : 0.14307 0.06061 0.7141
> Blue Tone Reproduction Curve    : (Binary data 2060 bytes, use -b option to extract)
> Device Model Desc               : IEC 61966-2-1 Default RGB Colour Space - sRGB
> Green Matrix Column             : 0.38515 0.71687 0.09708
> Green Tone Reproduction Curve   : (Binary data 2060 bytes, use -b option to extract)
> Luminance                       : 0 80 0
> Measurement Observer            : CIE 1931
> Measurement Backing             : 0 0 0
> Measurement Geometry            : Unknown
> Measurement Flare               : 0%
> Measurement Illuminant          : D65
> Media Black Point               : 0.01205 0.0125 0.01031
> Red Matrix Column               : 0.43607 0.22249 0.01392
> Red Tone Reproduction Curve     : (Binary data 2060 bytes, use -b option to extract)
> Technology                      : Cathode Ray Tube Display
> Viewing Cond Desc               : Reference Viewing Condition in IEC 61966-2-1
> Media White Point               : 0.9642 1 0.82491
> Profile Copyright               : Copyright International Color Consortium, 2009
> Chromatic Adaptation            : 1.04791 0.02293 -0.0502 0.0296 0.99046 -0.01707 -0.00925 0.01506 0.75179

And exiv2 no longer errors.

I am having an issue with the an infobox image not displaying in a Wikipedia article

https://en.wikipedia.org/wiki/Ravenswood_School_and_Residence

I am told it may be related to this bug. The image was uploaded by me to Commons in 2016. The article was written in 2016 and the image was probably added at that time to the article. The image is here on Commons

https://commons.wikimedia.org/wiki/File:Ravenswood_School_and_Residence_(2002).jpg

and it has migrated to English Wikipedia in the usual way

https://en.wikipedia.org/wiki/File:Ravenswood_School_and_Residence_(2002).jpg

The image looks OK at both of the above URLs, yet it will not render in the infobox of the article as a thumbail.

Thanks

Kerry

The Ravensbrook school has the same issue as the Rail Bridge.

exiv2 complains as "Not a valid ICC Profile" and exiftool reports an impossibly long profile "Warning : Bad length ICC_Profile (length 538976288)"

Applying the same workaround as before

exiftool "-ICC_Profile<=sRGB_IEC61966-2-1_no_black_scaling.icc"

Allows exiv2 and exiftool to function happily and I then uploaded back to commons, but there seems to be some caching issue, causing the error to continue. I then reverted and uploaded again and now the original fix works and the final fix doesn't. I'll wait and see if that clears or is another issue.

Both images in [[:en:Ventnor, Queensland]] are also not displaying.

Change #1101154 merged by jenkins-bot:

[operations/software/thumbor-plugins@master] imagemagick: ignore all py3exiv2 exceptions

https://gerrit.wikimedia.org/r/1101154

Change #1181666 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: new version with permissive pyexiv error handling

https://gerrit.wikimedia.org/r/1181666

Change #1181666 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: new version with permissive pyexiv error handling

https://gerrit.wikimedia.org/r/1181666

Looks like the affected thumbs are working now - a big thanks to @AntiCompositeNumber for the fix.