Page MenuHomePhabricator

Thumbnailing for c:File:Carl_Weigert.jpg fails due to py3exiv2 handling of invalid ICC profiles
Open, Needs TriagePublicBUG REPORT

Description

Cache clearing doesn't work.
Image: https://commons.wikimedia.org/wiki/File:Carl_Weigert.jpg
Article: https://ru.wikipedia.org/wiki/%D0%92%D0%B0%D0%B9%D0%B3%D0%B5%D1%80%D1%82,_%D0%9A%D0%B0%D1%80%D0%BB
Screenshot of infobox:

{1C352DAF-F325-4023-8961-886A0BFB35C5}.png (190×575 px, 15 KB)

Reported on ruwiki forum, frwiki has the same issue, says report.

Event Timeline

Aklapper renamed this task from Image displayed correctly on Commons, but broken thumbnail in article to Numerous [[c:File:Carl_Weigert.jpg]] thumbnails are HTTP 429 Too Many Requests.Thu, Dec 5, 5:54 PM

T380257 looks like similar issue.

Unrelated as librsvg is not used to thumbnail jpegs. All thumbnailing issues show up as 429s because of how Thumbor's failure throttling works, the first few requests will be a 500 then the rest will be 429s for the hour.

Request from [***] via cp1103 cp1103, Varnish XID 331549661
Upstream caches: cp1103 int
Error: 500, Internal Server Error at Fri, 06 Dec 2024 12:45:41 GMT

Testing locally:

thumbor-1  | 2024-12-06 16:40:14 thumbor:ERROR ERROR: Traceback (most recent call last):
thumbor-1  |   File "/opt/lib/venv/lib/python3.11/site-packages/thumbor/handlers/__init__.py", line 212, in get_image
thumbor-1  |     result = await self._fetch(self.context.request.image_url)
thumbor-1  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
thumbor-1  |   File "/opt/lib/venv/lib/python3.11/site-packages/thumbor/handlers/__init__.py", line 885, in _fetch
thumbor-1  |     raise fetch_result.exception
thumbor-1  |   File "/opt/lib/venv/lib/python3.11/site-packages/thumbor/handlers/__init__.py", line 853, in _fetch
thumbor-1  |     self.context.request.engine.load(fetch_result.buffer, extension)
thumbor-1  |   File "/srv/service/wikimedia_thumbor/engine/proxy/proxy.py", line 125, in load
thumbor-1  |     self.lcl[enginename].load(buffer, extension)
thumbor-1  |   File "/opt/lib/venv/lib/python3.11/site-packages/thumbor/engines/__init__.py", line 199, in load
thumbor-1  |     image_or_frames = self.create_image(buffer)
thumbor-1  |                       ^^^^^^^^^^^^^^^^^^^^^^^^^
thumbor-1  |   File "/srv/service/wikimedia_thumbor/engine/imagemagick/imagemagick.py", line 77, in create_image
thumbor-1  |     self.read_exif(temp_file)
thumbor-1  |   File "/srv/service/wikimedia_thumbor/engine/imagemagick/imagemagick.py", line 139, in read_exif
thumbor-1  |     metadata.read()
thumbor-1  |   File "/opt/lib/venv/lib/python3.11/site-packages/pyexiv2/metadata.py", line 117, in read
thumbor-1  |     self.__image._readMetadata()
thumbor-1  | ValueError: Not a valid ICC Profile
thumbor-1  | 
thumbor-1  | 2024-12-06 16:40:14 thumbor:ERROR [BaseHandler] get_image failed for url `https%3A//upload.wikimedia.org/wikipedia/commons/4/46/Carl_Weigert.jpg`. error: `Not a valid ICC Profile`
thumbor-1  | 2024-12-06 16:40:14 tornado.access:ERROR 500 GET /thumbor/unsafe/804x/https://upload.wikimedia.org/wikipedia/commons/4/46/Carl_Weigert.jpg (172.18.0.1) 376.23ms

Thanks @AntiCompositeNumber, I'm seeing the same. I suspect our ICC profile handling has become a bit stricter since we upgraded at various points libraries in T355020 and T336881, as there are older thumbnails of this image sitting in swift that were rendered by earlier versions of thumbor.

We should probably see if we can make pyexiv a bit more tolerant of bad files like this, as convert itself will happily handle images of this type with only minor complaint:

convert-im6.q16: profile 'icc': A357Bh: length does not match profile `Carl_Weigert.png' @ warning/png.c/MagickPNGWarningHandler/1668.

The error handling in rTHMBREXT wikimedia_thumbor/engine/imagemagick/imagemagick.py:126 was mostly written for pyexiv2, so it's not surprising that py3exiv2 has evolved since then, especially as exiv2 has worked on their ICC implementation. exiv2 is generally more picky about things than exiftool or imagemagick, but we only use it to read the Orientation EXIF value which isn't required to thumbnail the image.
We currently ignore a few specific known errors, raising on unknown errors. We could add ValueError to the list of ignored exceptions, or we could decide that all errors from py3exiv2 aren't really that important and except Exception instead. Looking at https://bazaar.launchpad.net/~vincent-vandevyvre/py3exiv2/trunk/view/head:/py3exiv2/src/exiv2wrapper.cpp#L1213 py3exiv2 can raise a few other exceptions we don't currently catch. Some of these exceptions are for things that exiftool and/or imagemagick also care about, but I think it's probably better to ignore them all.

Relatedly, probably should figure out if the logging problems worked around in T178072: Thumbor: Error reading image metadata: Failed to read image data affect py3exiv2. I somewhat expect they don't.

Change #1101154 had a related patch set uploaded (by AntiCompositeNumber; author: AntiCompositeNumber):

[operations/software/thumbor-plugins@master] imagemagick: ignore all py3exiv2 exceptions

https://gerrit.wikimedia.org/r/1101154

AntiCompositeNumber renamed this task from Numerous [[c:File:Carl_Weigert.jpg]] thumbnails are HTTP 429 Too Many Requests to Thumbnailing for c:File:Carl_Weigert.jpg fails due to py3exiv2 handling of invalid ICC profiles.Sat, Dec 7, 4:40 PM
AntiCompositeNumber claimed this task.