Page MenuHomePhabricator

Thumbor should support SVG files that start with <svg:svg
Closed, ResolvedPublic

Description

Details of the error are attached to the report, which will be publicly viewable. If you are not comfortable with that, you can edit the report below and remove all the data you don't want to share.

Error details:

error: could not load image from https://upload.wikimedia.org/wikipedia/commons/thumb/0/0e/Westmoreland_Heritage_Trail.svg/600px-Westmoreland_Heritage_Trail.svg.png
URL: https://en.wikipedia.org/wiki/Westmoreland_Heritage_Trail#/media/File:Westmoreland_Heritage_Trail.svg
user agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0
screen size: 2560x1440
canvas size: 1433x1222
image size: 600x241
thumbnail size: CSS: 1433x576, screen width: 1433, real width: 1920

Revisions and Commits

rTHMBREXT Thumbor Plugins
Restricted Differential Revision

Event Timeline

Tgr added subscribers: Gilles, Tgr.

Gives HTTP 500 then 429. There was no body for the 500 (didn't have a chance to inspect headers). I don't see anything in logstash (which is normal IIRC, errors are output but not logged - the HTTP 500 not having a body is unusual though). @Gilles is this Thumbor already, or is that beta only?

Aklapper renamed this task from Westmoreland_Heritage_Trail.svg to Westmoreland_Heritage_Trail.svg on Commons gives HTTP 500 then 429 error.Jul 12 2017, 12:11 AM

See also T170352.

Aklapper renamed this task from Westmoreland_Heritage_Trail.svg on Commons gives HTTP 500 then 429 error to PNG thumbnails for Westmoreland_Heritage_Trail.svg on Commons give HTTP 500 then 429 error.Jul 12 2017, 12:11 AM

@Tgr it's Thumbor already, whose error log isn't in logstash yet (that's the subject of T150734: Make Thumbor logs available in ELK). I'm working on making the user-facing errors more informative: T169683: Thumbor should return informative and nice-looking errors

429 after 500 is expected, we throttle requests on erroring thumbnails, in the exact same way Mediawiki does. Because if a file fails to convert, there's very little chance it will on the next attempt, unless something in our software stack has been upgraded.

SVGs is by far the most common type of failing thumbnail, usually through XML syntax that cannot be handled by the underlying conversion software (rsvg-convert, currently).

This is the error for that particular image from the thumbor error log on thumbor1001:

Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: 2017-07-12 08:35:24,771 8833 thumbor:ERROR [ExiftoolRunner] error: 'Error: Unknown file type - /srv/thumbor/tmp/thumbor@8833/tmpJalTQk\n'
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: 2017-07-12 08:35:24,773 8833 thumbor:ERROR [ThreadPool] 'ImageSize'
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: Traceback (most recent call last):
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: File "/usr/lib/python2.7/dist-packages/thumbor/context.py", line 268, in _execute_in_foreground
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: returned = operation()
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: File "/usr/lib/python2.7/dist-packages/thumbor/transformer.py", line 215, in img_operation_worker
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: self.resize()
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: File "/usr/lib/python2.7/dist-packages/thumbor/transformer.py", line 312, in resize
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: self.engine.resize(self.target_width or 1, self.target_height or 1)  # avoiding 0px images
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/engine/proxy/proxy.py", line 172, in resize
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: return self.__getattr__('resize')(width, height)
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/engine/imagemagick/imagemagick.py", line 313, in resize
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: 'jpeg:size=%s' % self.jpeg_size(),
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/engine/imagemagick/imagemagick.py", line 77, in jpeg_size
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: exif_image_size = self.exif['ImageSize']
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: KeyError: 'ImageSize'
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: 2017-07-12 08:35:24,795 8833 thumbor:ERROR [ImagesHandler] Exception during _load_results
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: Traceback (most recent call last):
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/handler/images/images.py", line 506, in _load_results
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: results, content_type = BaseHandler._load_results(self, context)
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: File "/usr/lib/python2.7/dist-packages/thumbor/handlers/__init__.py", line 334, in _load_results
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: results = context.request.engine.read(image_extension, quality)
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/engine/proxy/proxy.py", line 132, in read
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: ret = self.__getattr__('read')(extension, quality)
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/engine/imagemagick/imagemagick.py", line 263, in read
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: raise ImageMagickException('Failed to convert image: %s' % stderr)
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: ImageMagickException: Failed to convert image: convert: no decode delegate for this image format `' @ error/constitute.c/ReadImage/501.
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: convert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.
Jul 12 08:35:24 thumbor1001 thumbor@8833[6807]: 2017-07-12 08:35:24,797 8833 tornado.access:ERROR 500 GET /wikipedia/commons/thumb/0/0e/Westmoreland_Heritage_Trail.svg/600px-Westmoreland_Heritage_Trail.svg.png (127.0.0.1) 416.64ms

At a glance this suggests that the file isn't recognized as SVG. This detection is done by content sniffing, looking for a particular substring at the beginning of the file, up to a certain point. The relevant code is here: https://phabricator.wikimedia.org/diffusion/THMBREXT/browse/master/wikimedia_thumbor/engine/svg/svg.py;49b936caa487dcf3a9f1a77273cd26ca2718598c$26-29

Looking at the contents if this particular file, it's missing a starting "<?xml", the first tag being an <svg:svg> one:

<svg:svg xmlns:svg="http://www.w3.org/2000/svg" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" width="600px" height="240.8450675910232px" viewBox="0 0 600 240.8450675910232">

I don't know if that format is widespread, but we can certainly make the is_svg check in Thumbor more flexible to accommodate files like this, granted that rsvg can handle them (which seems to be the case here when testing this particular file). Thumbor is very conservative about its SVG detection, which is why I had to expand the check in the first place. I'm not surprised that our variety of SVG files hit another situation where it needs to be expanded further.

Gilles renamed this task from PNG thumbnails for Westmoreland_Heritage_Trail.svg on Commons give HTTP 500 then 429 error to Thumbor should support SVG files that start with <svg:svg.Jul 12 2017, 8:46 AM
Gilles claimed this task.
Gilles triaged this task as High priority.
Gilles edited projects, added Thumbor, Performance-Team; removed SRE-swift-storage.

FYI, this format is rather widespread. We also had to deal with this problem when we added the XML validation to the upload process way back then.
If I remember correctly, the xml declaration (as this is called), is optional in xml 1.0. If that is correct, then the thumbor check is making assumptions that are not necessarily valid.

Really the only way to check an XML file, is to parse it. And then throw a fuzzer at it, to make sure you didn't add security problems :)

Gilles added a revision: Restricted Differential Revision.Jul 12 2017, 1:30 PM

Change 364752 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/debs/python-thumbor-wikimedia@master] Upgrade to 1.0

https://gerrit.wikimedia.org/r/364752

Change 364752 merged by Filippo Giunchedi:
[operations/debs/python-thumbor-wikimedia@master] Upgrade to 1.0

https://gerrit.wikimedia.org/r/364752

Works fine on beta: https://commons.wikimedia.beta.wmflabs.org/wiki/File:Westmoreland_Heritage_Trail.svg

Can't verify the fix yet on production, because that particular file is still affected by 429s coming from the failure throttler. I might manually clear it from there if it doesn't resolve itself soon.

All good now. Affected files should just start working once they get out of the 429 throttling.

Thanks to all who have helped! I had no idea it would be such a difficult problem. I'm very happy to see it all working again.