Page MenuHomePhabricator

HTTP 500 error for some larger thumbnails of 14MB file File:Lençóis_Maranhenses_2018.jpg on Commons
Open, LowPublic

Event Timeline

Yann created this task.Jul 25 2018, 9:11 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 25 2018, 9:11 AM
ema added a subscriber: ema.Jul 25 2018, 9:17 AM

Here's the full thumbor response:

$ curl -v http://ms-fe.svc.eqiad.wmnet/wikipedia/commons/thumb/2/2a/Voando_num_mar_de_areia.jpg/3000px-Voando_num_mar_de_areia.jpg
* Hostname was NOT found in DNS cache
*   Trying 10.2.2.27...
* Connected to ms-fe.svc.eqiad.wmnet (10.2.2.27) port 80 (#0)
> GET /wikipedia/commons/thumb/2/2a/Voando_num_mar_de_areia.jpg/3000px-Voando_num_mar_de_areia.jpg HTTP/1.1
> User-Agent: curl/7.38.0
> Host: ms-fe.svc.eqiad.wmnet
> Accept: */*
> 
< HTTP/1.1 429 Error
< Content-Length: 0
< Thumbor-Memcache-Get-Time: 1
< Thumbor-Request-Date: Wed, 25 Jul 2018 09:10:14 GMT
* Server nginx/1.13.6 is not blacklisted
< Server: nginx/1.13.6
< Connection: close
< Cache-Control: no-cache
< Date: Wed, 25 Jul 2018 09:10:14 GMT
< Proxy-Request-Date: None
< Content-Type: text/html; charset=UTF-8
< X-Trans-Id: tx3d6d25c74b7c4313b5e96-005b583e76
< 
* Excess found in a non pipelined read: excess = 17 url = /wikipedia/commons/thumb/2/2a/Voando_num_mar_de_areia.jpg/3000px-Voando_num_mar_de_areia.jpg (zero-length body)
* Closing connection 0

Looks like thumbor/imagemagick are running into resource exhaustion when trying to scale this image (error below) resulting in 500s. Then poolcounter kicks in for this original due to repeated 500s while scaling and 429 are returned instead.

I'm going to retitle the task with the scaling failure itself

Jul 25 09:02:11 thumbor1003 thumbor@8818[46987]: 2018-07-25 09:02:11,246 8818 thumbor:ERROR [ImagesHandler] Exception during _load_results
Jul 25 09:02:11 thumbor1003 thumbor@8818[46987]: Traceback (most recent call last):
Jul 25 09:02:11 thumbor1003 thumbor@8818[46987]: File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/handler/images/images.py", line 571, in _load_results
Jul 25 09:02:11 thumbor1003 thumbor@8818[46987]: results, content_type = BaseHandler._load_results(self, context)
Jul 25 09:02:11 thumbor1003 thumbor@8818[46987]: File "/usr/lib/python2.7/dist-packages/thumbor/handlers/__init__.py", line 334, in _load_results
Jul 25 09:02:11 thumbor1003 thumbor@8818[46987]: results = context.request.engine.read(image_extension, quality)
Jul 25 09:02:11 thumbor1003 thumbor@8818[46987]: File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/engine/proxy/proxy.py", line 133, in read
Jul 25 09:02:11 thumbor1003 thumbor@8818[46987]: ret = self.__getattr__('read')(extension, quality)
Jul 25 09:02:11 thumbor1003 thumbor@8818[46987]: File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/engine/imagemagick/imagemagick.py", line 321, in read
Jul 25 09:02:11 thumbor1003 thumbor@8818[46987]: raise ImageMagickException('Failed to convert image %s' % stderr)  # pragma: no cover
Jul 25 09:02:11 thumbor1003 thumbor@8818[46987]: ImageMagickException: Failed to convert image convert: DistributedPixelCache '127.0.0.1' @ error/distribute-ca
che.c/ConnectPixelCacheServer/217.
Jul 25 09:02:11 thumbor1003 thumbor@8818[46987]: convert: cache resources exhausted `/srv/thumbor/tmp/thumbor@8818/tmpVBjsFF' @ error/cache.c/OpenPixelCache/36
59.
Jul 25 09:02:11 thumbor1003 thumbor@8818[46987]: convert: no images defined `jpg:-' @ error/convert.c/ConvertImageCommand/3210.
Jul 25 09:02:11 thumbor1003 thumbor@8818[46987]: 2018-07-25 09:02:11,248 8818 tornado.access:ERROR 500 GET /wikipedia/commons/thumb/2/2a/Voando_num_mar_de_areia.jpg/3000px-Voando_num_mar_de_areia.jpg (10.64.32.221) 579.09ms
fgiunchedi renamed this task from Thumbnails not created: Error: 429, Too Many Requests to cache resources exhausted while scaling File:Voando_num_mar_de_areia.jpg.Jul 25 2018, 9:40 AM
ema added a comment.Jul 25 2018, 9:44 AM

Then poolcounter kicks in for this original due to repeated 500s while scaling and 429 are returned instead.

The 429s are returned with empty body, and we do add a generic WMF error page at the caching layer for 4xx errors with Content-Length: 0. It would be very useful if the application layer could provide a specific error message instead.

jcrespo added subscribers: Gilles, jcrespo.

Looks scaling related, not storage, then. CC @Gilles (sorry if I am pinging the wrong person, please comment if that is the case)

This comment was removed by Krinkle.
Krinkle renamed this task from cache resources exhausted while scaling File:Voando_num_mar_de_areia.jpg to Resource exhausted while scaling File:Voando_num_mar_de_areia.jpg.Jul 30 2018, 8:14 PM
Gilles claimed this task.Aug 27 2018, 8:14 PM
Gilles moved this task from Inbox to Backlog: Small & Maintenance on the Performance-Team board.

The issue is reproducible on Vagrant, which has the same memory and disk limits as production for ImageMagick. This image has a ton of EXIF data, a lot of it related to visual properties, probably coming from Lightroom. I'm guessing that some of it is causing ImageMagick to consume a lot of memory to resize the image. Beyond a certain thumbnail size, it just hits the limit.

We have to draw the line somewhere when it comes to individual files consuming a lot of resources to be processed. We can't increase the already generous limits for the sake of this one file (900MB max memory, 900MB max disk usage). As a workaround I would advise stripping most of the EXIF this file contains and seeing it it changes anything.

Gilles removed Gilles as the assignee of this task.Aug 29 2018, 1:49 PM
Aklapper renamed this task from Resource exhausted while scaling File:Voando_num_mar_de_areia.jpg to HTTP 500 error for some larger thumbnails of 14MB file File:Lençóis_Maranhenses_2018.jpg on Commons.Mar 29 2019, 2:03 PM
aaron edited projects, added Performance-Team (Radar); removed Performance-Team.
aaron triaged this task as Low priority.