When running exiftool against some files with non-ascii characters, we incur this error:
2023-01-16 11:04:38,734 ???? thumbor:ERROR UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 19: invalid start byte 2023-01-16 11:04:38,734 ???? thumbor:ERROR ERROR: Traceback (most recent call last): File "/opt/lib/python/site-packages/thumbor/handlers/__init__.py", line 212, in get_image result = await self._fetch(self.context.request.image_url) File "/opt/lib/python/site-packages/thumbor/handlers/__init__.py", line 876, in _fetch raise fetch_result.exception File "/opt/lib/python/site-packages/thumbor/handlers/__init__.py", line 844, in _fetch self.context.request.engine.load(fetch_result.buffer, extension) File "/srv/service/wikimedia_thumbor/engine/proxy/proxy.py", line 125, in load self.lcl[enginename].load(buffer, extension) File "/opt/lib/python/site-packages/thumbor/engines/__init__.py", line 195, in load image_or_frames = self.create_image(buffer) File "/srv/service/wikimedia_thumbor/engine/imagemagick/imagemagick.py", line 77, in create_image self.read_exif(temp_file) File "/srv/service/wikimedia_thumbor/engine/imagemagick/imagemagick.py", line 164, in read_exif values = s.decode('utf-8').split(': ', 1) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 19: invalid start byte 2023-01-16 11:04:38,734 ???? thumbor:ERROR [BaseHandler] get_image failed for url `https%3A//swift.discovery.wmnet/v1/AUTH_mw/wikipedia-commons-local-public.50/5/50/2023-01-15_15-49-13_voeux-maire-Belfort.jpg`. error: `'utf-8' codec can't decode byte 0x9c in position 19: invalid start byte` 2023-01-16 11:04:38,735 ???? tornado.access:ERROR 500 GET /wikipedia/commons/thumb/5/50/2023-01-15_15-49-13_voeux-maire-Belfort.jpg/1169px-2023-01-15_15-49-13_voeux-maire-Belfort.jpg (10.64.48.230) 4516.76ms
This occurs because the output of exiftool emits Image Description : Vœux du Maire de Belfort, Damien Meslot, au gymnase Le Phare, Belfort, le 15 janvier 2023.. "œ" is byte 0x9c
According to the author of exiftool, unicode output isn't guaranteed unless JSON or XML output is used - we shouldn't be manually splitting string values in the first place, and this is another good reason to use JSON output everywhere.