Page MenuHomePhabricator

Thumbnail generation fails (HTTP 503) for some sizes of File:Montreal_Skyline_from_Mont_Royal_raw.png
Open, Needs TriagePublic

Description

https://commons.wikimedia.org/wiki/File:Montreal_Skyline_from_Mont_Royal_raw.png was not really related to the other files in that task, so I'm splitting it out here.

This is a 1.1GB, 228 MP PNG image. It is over 10 MP, so it is rendered by the Vips engine. Thumbnail generation works at some sizes, but fails at others.

SizeResultNote
320pxWorksCache hit Last modified 15 May 2020
417pxFailed 503
640pxWorksCache hit Last modified 15 May 2020
799pxWorksCache hit Last modified 15 May 2020
1024pxFailed -> WorksNewly generated, cache pass at 03:00 UTC, Last Modified 22 May 2020 02:29 UTC, previously 503'd at 02:27 UTC
1280pxWorksCache hit, last modified 22 May 2020 02:29 UTC
4560pxFailed 503
417px
Request from *** via cp1078 frontend, Varnish XID 829235194
Error: 503, Backend fetch failed at Fri, 22 May 2020 03:15:07 GMT
HTTP/2 503 Service Unavailable

date: Fri, 22 May 2020 03:15:07 GMT
server: Varnish
content-type: text/html; charset=utf-8
age: 0
x-cache: cp1078 int
x-cache-status: int-front
server-timing: cache;desc="int-front"
strict-transport-security: max-age=106384710; includeSubDomains; preload
x-client-ip: ***
access-control-allow-origin: *
access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache
timing-allow-origin: *
content-length: 1793
X-Firefox-Spdy: h2
1024px 503
Request from *** via cp1076 frontend, Varnish XID 645500130
Error: 503, Backend fetch failed at Fri, 22 May 2020 02:27:32 GMT

HTTP/2 503 Service Unavailable
date: Fri, 22 May 2020 02:27:32 GMT
server: Varnish
content-type: text/html; charset=utf-8
age: 0
x-cache: cp1076 int
x-cache-status: int-front
server-timing: cache;desc="int-front"
strict-transport-security: max-age=106384710; includeSubDomains; preload
x-client-ip: ***
access-control-allow-origin: *
access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache
timing-allow-origin: *
content-length: 1820
X-Firefox-Spdy: h2
1024px 200
HTTP/2 200 OK
date: Fri, 22 May 2020 03:03:55 GMT
content-type: image/png
content-length: 1449590
accept-ranges: bytes
last-modified: Fri, 22 May 2020 02:29:56 GMT
etag: b5a670127a58f5bdf30f19c9be3060ab
x-timestamp: 1590114595.61997
server: ATS/8.0.7
age: 0
x-cache: cp1078 miss, cp1078 pass
x-cache-status: pass
server-timing: cache;desc="pass"
strict-transport-security: max-age=106384710; includeSubDomains; preload
x-client-ip: ***
access-control-allow-origin: *
access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache
timing-allow-origin: *
X-Firefox-Spdy: h2
4560px
Request from *** via cp1078 frontend, Varnish XID 828001589
Error: 503, Backend fetch failed at Fri, 22 May 2020 03:24:10 GMT

HTTP/2 503 Service Unavailable
date: Fri, 22 May 2020 03:24:10 GMT
server: Varnish
content-type: text/html; charset=utf-8
age: 0
x-cache: cp1078 int
x-cache-status: int-front
server-timing: cache;desc="int-front"
strict-transport-security: max-age=106384710; includeSubDomains; preload
x-client-ip: ***
access-control-allow-origin: *
access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache
timing-allow-origin: *
content-length: 1793
X-Firefox-Spdy: h2

The failures don't seem to be related to output size and aren't very consistant over time.
Testing locally, I noticed that the file was affected by T219569 and takes a considerable time to generate a thumbnail.

Local Thumbor logs
2020-05-22 03:34:37 thumbor:DEBUG METRICS: inc: response.count:1
2020-05-22 03:34:37 thumbor:DEBUG Format specified: png
2020-05-22 03:34:37 thumbor:DEBUG METRICS: inc: storage.miss:1
2020-05-22 03:34:37 thumbor:DEBUG [HTTPS] load_sync: https%3A//upload.wikimedia.org/wikipedia/commons/3/37/Montreal_Skyline_from_Mont_Royal_raw.png
2020-05-22 03:34:37 thumbor:DEBUG [HTTPS] Loading normalized URL: https://upload.wikimedia.org/wikipedia/commons/3/37/Montreal_Skyline_from_Mont_Royal_raw.png
2020-05-22 03:36:21 thumbor:DEBUG [HTTPS] return_contents: /tmp/tmpmqtTpp
2020-05-22 03:36:21 thumbor:DEBUG METRICS: inc: original_image.status.200:1
2020-05-22 03:36:21 thumbor:DEBUG METRICS: inc: original_image.response_bytes:4096
2020-05-22 03:36:21 thumbor:DEBUG [Proxy] Looking for a png engine
2020-05-22 03:36:21 thumbor:DEBUG [ExiftoolRunner] command: ['/usr/bin/exiftool', '-ImageSize', '-s', '-s', '-s', '/tmp/tmpmqtTpp']
2020-05-22 03:36:21 thumbor:DEBUG [ShellRunner] Command: ['/usr/bin/timeout', '--foreground', '59', '/usr/bin/exiftool', '-ImageSize', '-s', '-s', '-s', '/tmp/tmpmqtTpp']
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Stdout: 29684x7620

2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Stderr: 
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Return code: 0
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Duration: 3296.35
2020-05-22 03:36:25 thumbor:DEBUG [VIPS] Shrinking with command
2020-05-22 03:36:25 thumbor:DEBUG [BWE] Found source file in context
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Command: ['/usr/bin/timeout', '--foreground', '59', '/usr/bin/vips', 'shrink', '/tmp/tmpmqtTpp', '/tmp/tmpiNR44f/vips_result.png', '35', '35']
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Stdout: 
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Stderr: vipspng: iCCP: profile 'ProPhoto RGB': 0h: PCS illuminant is not D50
vipspng: iCCP: profile 'ProPhoto RGB': 0h: PCS illuminant is not D50
vipspng: profile 'icc': 0h: PCS illuminant is not D50
vips2png: unable to write "/tmp/tmpiNR44f/vips_result.png"

2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Return code: 1
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Duration: 169.056
2020-05-22 03:36:25 thumbor:DEBUG [VIPS] Forcing TinyRGB profile
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Command: ['/usr/bin/timeout', '--foreground', '59', '/usr/bin/vips', 'shrink', '/tmp/tmpmqtTpp', '/tmp/tmpZ1AwC1/vips_result.png[profile=/srv/thumbor-plugins/tests/integration/tinyrgb.icc]', '35', '35']
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Stdout: 
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Stderr: 
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Return code: 0
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Duration: 21336.573
2020-05-22 03:36:46 thumbor:DEBUG [IM] Dumping buffer into temp file
2020-05-22 03:36:46 thumbor:DEBUG [ExiftoolRunner] command: ['/usr/bin/exiftool', '-s', '-s', '-ImageSize', '-ProfileDescription', '-ColorType', '-FileType', '-Transparency', '-Artist', '-Copyright', '-ImageDescription', '/tmp/tmp2lrs62']
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Command: ['/usr/bin/timeout', '--foreground', '59', '/usr/bin/exiftool', '-s', '-s', '-ImageSize', '-ProfileDescription', '-ColorType', '-FileType', '-Transparency', '-Artist', '-Copyright', '-ImageDescription', '/tmp/tmp2lrs62']
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Stdout: ImageSize: 848x217
ProfileDescription: c2
ColorType: RGB with Alpha
FileType: PNG

2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Stderr: 
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Return code: 0
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Duration: 237.22899999999998
2020-05-22 03:36:46 thumbor:DEBUG [IM] EXIF: {'ProfileDescription': 'c2', 'FileType': 'PNG', 'ColorType': 'RGB with Alpha', 'ImageSize': '848x217'}
2020-05-22 03:36:46 thumbor:DEBUG [IM] File has non-sRGB profile
2020-05-22 03:36:46 thumbor:DEBUG [ExiftoolRunner] command: ['/usr/bin/exiftool', '-icc_profile', '-b', '-m', '/tmp/tmp2lrs62']
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Command: ['/usr/bin/timeout', '--foreground', '59', '/usr/bin/exiftool', '-icc_profile', '-b', '-m', '/tmp/tmp2lrs62']
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Stdout: 
                                                        lcmsmntrRGB XYZ �)9acspAPPL���-lcms
desc�^cprt\
           wtpthbkpt|rXYZ�gXYZ�bXYZ�rTRC�@gTRC�@bTRC�@descc2textFBXYZ ���-XYZ 3�XYZ o�8��XYZ b����XYZ $����curv���ck
                    �?Q!�)�2;�FQw]�kpz���|�i�}���0��
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Stderr: 
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Return code: 0
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Duration: 168.149
2020-05-22 03:36:47 thumbor:DEBUG [IM] reorientate
2020-05-22 03:36:47 thumbor:DEBUG [IM] resize: 417.0 107.0
2020-05-22 03:36:47 thumbor:DEBUG [IM] Queued operators: ['-resize', '417x107^', '-gravity', 'center', '-extent', '417x107', '-background', 'none']
2020-05-22 03:36:47 thumbor:DEBUG Image format specified as .png.
2020-05-22 03:36:47 thumbor:DEBUG Content Type of image/png detected.
2020-05-22 03:36:47 thumbor:DEBUG [BWE] Rendering png
2020-05-22 03:36:47 thumbor:DEBUG [IM] read: png 87
2020-05-22 03:36:47 thumbor:DEBUG [IM] Chroma subsampling: '4:2:0'
2020-05-22 03:36:47 thumbor:DEBUG [IM] Generating image with quality 95
2020-05-22 03:36:47 thumbor:DEBUG [IM] Queued operators: ['-resize', '417x107^', '-gravity', 'center', '-extent', '417x107', '-background', 'none', '-quality', '95', '-sampling-factor', '4:2:0']
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Command: ['/usr/bin/timeout', '--foreground', '59', '/usr/bin/convert', '-define', 'tiff:exif-properties=no', '-resize', '417x107^', '-gravity', 'center', '-extent', '417x107', '-background', 'none', '-quality', '95', '-sampling-factor', '4:2:0', '/tmp/tmp2lrs62[0]', u'png:-']
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Stdout: <too long to display (246589 bytes)>
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Stderr: 
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Return code: 0
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Duration: 215.41
2020-05-22 03:36:47 thumbor:DEBUG METRICS: timing: engine.processing_time.wikimedia_thumbor.engine.vips:25521
2020-05-22 03:36:47 thumbor:DEBUG METRICS: timing: engine.processing_utime.wikimedia_thumbor.engine.vips:21956
2020-05-22 03:36:47 tornado.access:INFO 200 GET /thumbor/unsafe/417x/filters:format(png)/https://upload.wikimedia.org/wikipedia/commons/3/37/Montreal_Skyline_from_Mont_Royal_raw.png (172.17.0.1) 129541.88ms
2020-05-22 03:36:47 thumbor:DEBUG METRICS: timing: response.time:129541
2020-05-22 03:36:47 thumbor:DEBUG METRICS: timing: response.time.200:129541
2020-05-22 03:36:47 thumbor:DEBUG METRICS: inc: response.status.200:1
2020-05-22 03:36:47 thumbor:DEBUG METRICS: inc: response.format.png:1
2020-05-22 03:36:47 thumbor:DEBUG METRICS: timing: response.time.png:129541
2020-05-22 03:36:47 thumbor:DEBUG METRICS: inc: response.bytes.png:246589
2020-05-22 03:36:47 thumbor:DEBUG [HTTPS] cleanup_temp_file: /tmp/tmpmqtTpp

1:44 is spent downloading the 1.1GB file, then another 0:21 is spent scaling the image with Vips. While the time it takes to download a file over the internet is not representative of the time it takes to copy a file between two servers in the same datacenter, I think this is indicative of at least part of the problem. With the mitigation for "invalid" ICC profiles in place, thumbnails can be generated for this file. Someone with production log access is needed to find the true cause.

Event Timeline

Trying all those again,

SizeResultNote
320pxWorksCache hit Last modified 15 May 2020
417pxWorksCache hit Last modified 22 May 2020
640pxWorksCache hit Last modified 15 May 2020
799pxWorksCache hit Last modified 15 May 2020
1024pxWorksCache hit Last Modified 22 May 2020 02:29 UTC
1280pxWorksCache hit, last modified 22 May 2020 02:29 UTC
4560pxFailed 503

Trying 4560px locally timed out loading the original after 5 minutes. The other files seem to have fixed themselves when I filed this task.