https://commons.wikimedia.org/wiki/File:Montreal_Skyline_from_Mont_Royal_raw.png was not really related to the other files in that task, so I'm splitting it out here.
This is a 1.1GB, 228 MP PNG image. It is over 10 MP, so it is rendered by the Vips engine. Thumbnail generation works at some sizes, but fails at others.
| Size | Result | Note |
|---|---|---|
| 320px | Works | Cache hit Last modified 15 May 2020 |
| 417px | Failed 503 | |
| 640px | Works | Cache hit Last modified 15 May 2020 |
| 799px | Works | Cache hit Last modified 15 May 2020 |
| 1024px | Failed -> Works | Newly generated, cache pass at 03:00 UTC, Last Modified 22 May 2020 02:29 UTC, previously 503'd at 02:27 UTC |
| 1280px | Works | Cache hit, last modified 22 May 2020 02:29 UTC |
| 4560px | Failed 503 | |
Request from *** via cp1078 frontend, Varnish XID 829235194 Error: 503, Backend fetch failed at Fri, 22 May 2020 03:15:07 GMT HTTP/2 503 Service Unavailable date: Fri, 22 May 2020 03:15:07 GMT server: Varnish content-type: text/html; charset=utf-8 age: 0 x-cache: cp1078 int x-cache-status: int-front server-timing: cache;desc="int-front" strict-transport-security: max-age=106384710; includeSubDomains; preload x-client-ip: *** access-control-allow-origin: * access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache timing-allow-origin: * content-length: 1793 X-Firefox-Spdy: h2
Request from *** via cp1076 frontend, Varnish XID 645500130 Error: 503, Backend fetch failed at Fri, 22 May 2020 02:27:32 GMT HTTP/2 503 Service Unavailable date: Fri, 22 May 2020 02:27:32 GMT server: Varnish content-type: text/html; charset=utf-8 age: 0 x-cache: cp1076 int x-cache-status: int-front server-timing: cache;desc="int-front" strict-transport-security: max-age=106384710; includeSubDomains; preload x-client-ip: *** access-control-allow-origin: * access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache timing-allow-origin: * content-length: 1820 X-Firefox-Spdy: h2
HTTP/2 200 OK date: Fri, 22 May 2020 03:03:55 GMT content-type: image/png content-length: 1449590 accept-ranges: bytes last-modified: Fri, 22 May 2020 02:29:56 GMT etag: b5a670127a58f5bdf30f19c9be3060ab x-timestamp: 1590114595.61997 server: ATS/8.0.7 age: 0 x-cache: cp1078 miss, cp1078 pass x-cache-status: pass server-timing: cache;desc="pass" strict-transport-security: max-age=106384710; includeSubDomains; preload x-client-ip: *** access-control-allow-origin: * access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache timing-allow-origin: * X-Firefox-Spdy: h2
Request from *** via cp1078 frontend, Varnish XID 828001589 Error: 503, Backend fetch failed at Fri, 22 May 2020 03:24:10 GMT HTTP/2 503 Service Unavailable date: Fri, 22 May 2020 03:24:10 GMT server: Varnish content-type: text/html; charset=utf-8 age: 0 x-cache: cp1078 int x-cache-status: int-front server-timing: cache;desc="int-front" strict-transport-security: max-age=106384710; includeSubDomains; preload x-client-ip: *** access-control-allow-origin: * access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache timing-allow-origin: * content-length: 1793 X-Firefox-Spdy: h2
The failures don't seem to be related to output size and aren't very consistant over time.
Testing locally, I noticed that the file was affected by T219569 and takes a considerable time to generate a thumbnail.
2020-05-22 03:34:37 thumbor:DEBUG METRICS: inc: response.count:1
2020-05-22 03:34:37 thumbor:DEBUG Format specified: png
2020-05-22 03:34:37 thumbor:DEBUG METRICS: inc: storage.miss:1
2020-05-22 03:34:37 thumbor:DEBUG [HTTPS] load_sync: https%3A//upload.wikimedia.org/wikipedia/commons/3/37/Montreal_Skyline_from_Mont_Royal_raw.png
2020-05-22 03:34:37 thumbor:DEBUG [HTTPS] Loading normalized URL: https://upload.wikimedia.org/wikipedia/commons/3/37/Montreal_Skyline_from_Mont_Royal_raw.png
2020-05-22 03:36:21 thumbor:DEBUG [HTTPS] return_contents: /tmp/tmpmqtTpp
2020-05-22 03:36:21 thumbor:DEBUG METRICS: inc: original_image.status.200:1
2020-05-22 03:36:21 thumbor:DEBUG METRICS: inc: original_image.response_bytes:4096
2020-05-22 03:36:21 thumbor:DEBUG [Proxy] Looking for a png engine
2020-05-22 03:36:21 thumbor:DEBUG [ExiftoolRunner] command: ['/usr/bin/exiftool', '-ImageSize', '-s', '-s', '-s', '/tmp/tmpmqtTpp']
2020-05-22 03:36:21 thumbor:DEBUG [ShellRunner] Command: ['/usr/bin/timeout', '--foreground', '59', '/usr/bin/exiftool', '-ImageSize', '-s', '-s', '-s', '/tmp/tmpmqtTpp']
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Stdout: 29684x7620
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Stderr:
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Return code: 0
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Duration: 3296.35
2020-05-22 03:36:25 thumbor:DEBUG [VIPS] Shrinking with command
2020-05-22 03:36:25 thumbor:DEBUG [BWE] Found source file in context
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Command: ['/usr/bin/timeout', '--foreground', '59', '/usr/bin/vips', 'shrink', '/tmp/tmpmqtTpp', '/tmp/tmpiNR44f/vips_result.png', '35', '35']
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Stdout:
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Stderr: vipspng: iCCP: profile 'ProPhoto RGB': 0h: PCS illuminant is not D50
vipspng: iCCP: profile 'ProPhoto RGB': 0h: PCS illuminant is not D50
vipspng: profile 'icc': 0h: PCS illuminant is not D50
vips2png: unable to write "/tmp/tmpiNR44f/vips_result.png"
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Return code: 1
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Duration: 169.056
2020-05-22 03:36:25 thumbor:DEBUG [VIPS] Forcing TinyRGB profile
2020-05-22 03:36:25 thumbor:DEBUG [ShellRunner] Command: ['/usr/bin/timeout', '--foreground', '59', '/usr/bin/vips', 'shrink', '/tmp/tmpmqtTpp', '/tmp/tmpZ1AwC1/vips_result.png[profile=/srv/thumbor-plugins/tests/integration/tinyrgb.icc]', '35', '35']
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Stdout:
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Stderr:
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Return code: 0
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Duration: 21336.573
2020-05-22 03:36:46 thumbor:DEBUG [IM] Dumping buffer into temp file
2020-05-22 03:36:46 thumbor:DEBUG [ExiftoolRunner] command: ['/usr/bin/exiftool', '-s', '-s', '-ImageSize', '-ProfileDescription', '-ColorType', '-FileType', '-Transparency', '-Artist', '-Copyright', '-ImageDescription', '/tmp/tmp2lrs62']
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Command: ['/usr/bin/timeout', '--foreground', '59', '/usr/bin/exiftool', '-s', '-s', '-ImageSize', '-ProfileDescription', '-ColorType', '-FileType', '-Transparency', '-Artist', '-Copyright', '-ImageDescription', '/tmp/tmp2lrs62']
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Stdout: ImageSize: 848x217
ProfileDescription: c2
ColorType: RGB with Alpha
FileType: PNG
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Stderr:
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Return code: 0
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Duration: 237.22899999999998
2020-05-22 03:36:46 thumbor:DEBUG [IM] EXIF: {'ProfileDescription': 'c2', 'FileType': 'PNG', 'ColorType': 'RGB with Alpha', 'ImageSize': '848x217'}
2020-05-22 03:36:46 thumbor:DEBUG [IM] File has non-sRGB profile
2020-05-22 03:36:46 thumbor:DEBUG [ExiftoolRunner] command: ['/usr/bin/exiftool', '-icc_profile', '-b', '-m', '/tmp/tmp2lrs62']
2020-05-22 03:36:46 thumbor:DEBUG [ShellRunner] Command: ['/usr/bin/timeout', '--foreground', '59', '/usr/bin/exiftool', '-icc_profile', '-b', '-m', '/tmp/tmp2lrs62']
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Stdout:
lcmsmntrRGB XYZ �)9acspAPPL���-lcms
desc�^cprt\
wtpthbkpt|rXYZ�gXYZ�bXYZ�rTRC�@gTRC�@bTRC�@descc2textFBXYZ ���-XYZ 3�XYZ o�8��XYZ b����XYZ $����curv���ck
�?Q!�)�2;�FQw]�kpz���|�i�}���0��
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Stderr:
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Return code: 0
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Duration: 168.149
2020-05-22 03:36:47 thumbor:DEBUG [IM] reorientate
2020-05-22 03:36:47 thumbor:DEBUG [IM] resize: 417.0 107.0
2020-05-22 03:36:47 thumbor:DEBUG [IM] Queued operators: ['-resize', '417x107^', '-gravity', 'center', '-extent', '417x107', '-background', 'none']
2020-05-22 03:36:47 thumbor:DEBUG Image format specified as .png.
2020-05-22 03:36:47 thumbor:DEBUG Content Type of image/png detected.
2020-05-22 03:36:47 thumbor:DEBUG [BWE] Rendering png
2020-05-22 03:36:47 thumbor:DEBUG [IM] read: png 87
2020-05-22 03:36:47 thumbor:DEBUG [IM] Chroma subsampling: '4:2:0'
2020-05-22 03:36:47 thumbor:DEBUG [IM] Generating image with quality 95
2020-05-22 03:36:47 thumbor:DEBUG [IM] Queued operators: ['-resize', '417x107^', '-gravity', 'center', '-extent', '417x107', '-background', 'none', '-quality', '95', '-sampling-factor', '4:2:0']
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Command: ['/usr/bin/timeout', '--foreground', '59', '/usr/bin/convert', '-define', 'tiff:exif-properties=no', '-resize', '417x107^', '-gravity', 'center', '-extent', '417x107', '-background', 'none', '-quality', '95', '-sampling-factor', '4:2:0', '/tmp/tmp2lrs62[0]', u'png:-']
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Stdout: <too long to display (246589 bytes)>
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Stderr:
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Return code: 0
2020-05-22 03:36:47 thumbor:DEBUG [ShellRunner] Duration: 215.41
2020-05-22 03:36:47 thumbor:DEBUG METRICS: timing: engine.processing_time.wikimedia_thumbor.engine.vips:25521
2020-05-22 03:36:47 thumbor:DEBUG METRICS: timing: engine.processing_utime.wikimedia_thumbor.engine.vips:21956
2020-05-22 03:36:47 tornado.access:INFO 200 GET /thumbor/unsafe/417x/filters:format(png)/https://upload.wikimedia.org/wikipedia/commons/3/37/Montreal_Skyline_from_Mont_Royal_raw.png (172.17.0.1) 129541.88ms
2020-05-22 03:36:47 thumbor:DEBUG METRICS: timing: response.time:129541
2020-05-22 03:36:47 thumbor:DEBUG METRICS: timing: response.time.200:129541
2020-05-22 03:36:47 thumbor:DEBUG METRICS: inc: response.status.200:1
2020-05-22 03:36:47 thumbor:DEBUG METRICS: inc: response.format.png:1
2020-05-22 03:36:47 thumbor:DEBUG METRICS: timing: response.time.png:129541
2020-05-22 03:36:47 thumbor:DEBUG METRICS: inc: response.bytes.png:246589
2020-05-22 03:36:47 thumbor:DEBUG [HTTPS] cleanup_temp_file: /tmp/tmpmqtTpp1:44 is spent downloading the 1.1GB file, then another 0:21 is spent scaling the image with Vips. While the time it takes to download a file over the internet is not representative of the time it takes to copy a file between two servers in the same datacenter, I think this is indicative of at least part of the problem. With the mitigation for "invalid" ICC profiles in place, thumbnails can be generated for this file. Someone with production log access is needed to find the true cause.