23:53 <Urbanecm> Hello everyone, I get constant 429 with https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Mimo%C5%99%C3%A1dn%C3%A9_opat%C5%99en%C3%AD_-_z%C3%A1kaz_v%C3%BDvozu_desinfekce_rukou.pdf/page2-636px-Mimo%C5%99%C3%A1dn%C3%A9_opat%C5%99en%C3%AD_-_z%C3%A1kaz_v%C3%BDvozu_desinfekce_rukou.pdf.jpg, even when I try from my server with a dedicated public IP. What is happening? 00:00 <bd808> Urbanecm: not just you. Probably worth a phab task. It kind of looks to me from the response like thumbor is barfing on fulfilling the request and varnish gave up for a while.
Description
Related Objects
- Mentioned In
- T251059: 18KOZ.pdf cannot be rendered: error: 'Warning: Skipped unknown 111 byte header
- Mentioned Here
- T50007: Error creating PDF on Commons: "convert: no decode delegate for this image format" (fixed in GS 9.07)
T236240: Ghostscript outputs errors to stdout despite -q, preventing Thumbor from generating some thumbnails properly
T223357: HTTP 500 for thumbnails of damaged PDF file File:Mueller_letter_to_Barr_2019-03-27.pdf
T239510: No preview thumbnail generated for PDF on Commons: "Error: 429, Too Many Requests"
Event Timeline
Many last PDF by Janbery seems to fail (last page always), cf https://commons.wikimedia.org/wiki/Special:Contributions/Janbery.
Hmm seems it's actually two issues... a) something throws 429 on a server-side error b) our thumbnailing logic fails to generate a thumbnail.
https://logstash.wikimedia.org/goto/8516bf501bea410ecd2df2ed2f597044 is my logstash query. This suggests something wrong is inside imagemagick itself.
Running the PDF through GhostScript locally generates an error on the second page:
$ gs -sDEVICE=jpeg -dJPEG=90 -r150 -DBATCH -dNOPAUSE -dSAFER -sOutputFile=Mimořádné_opatření_-_zákaz_vývozu_desinfekce_rukou%d.jpg Mimořádné_opatření_-_zákaz_vývozu_desinfekce_rukou.pdf GPL Ghostscript 9.50 (2019-10-15) Copyright (C) 2019 Artifex Software, Inc. All rights reserved. This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY: see the file COPYING for details. Processing pages 1 through 2. Page 1 Page 2 **** Error: Invalid (0 scaling) text matrix for Tm **** Output may be incorrect. **** This file had errors that were repaired or ignored. **** The file was produced by: **** >>>> Microsoft� Word 2013 (GORDIC PDF Normalizer 4.0.11.30) <<<< **** Please notify the author of the software that produced this **** file that it does not conform to Adobe's published PDF **** specification. **** The rendered output from this file may be incorrect.
Outputting to a file works fine, but when I try outputting to STDOUT and piping to imagemagick, it can't detect a proper JPEG file.
$ gs -sDEVICE=jpeg -dJPEG=90 -sOutputFile=%stdout -dFirstPage=2 -dLastPage=2 -r150 -dBATCH -dNOPAUSE -dSAFER -q -f Mimořádné_opatření_-_zákaz_vývozu_desinfekce_rukou.pdf | identify -verbose jpeg:- identify: Not a JPEG file: starts with 0x20 0x20 `/tmp/magick-1378681m3sUwOksSNuy' @ error/jpeg.c/JPEGErrorHandler/342.
I can't see what the logstash entries are, so I don't know if this is the exact same error. Since there is an error in the thumnailer, the 429 behavior is expected. The caching layer will give Thumbor a few tries to generate a failing thumbnail, then respond with 429s for a while.
$ gs -sDEVICE=jpeg -dJPEG=90 -sOutputFile=%stdout -dFirstPage=2 -dLastPage=2 -r150 -dBATCH -dNOPAUSE -dSAFER -q -f Mimořádné_opatření_-_zákaz_vývozu_desinfekce_rukou.pdf | head **** Error: Invalid (0 scaling) text matrix for Tm **** Output may be incorrect. ����JFIF���� ICC_PROFILE mntrRGB XYZ acspAPPL���- desc�|cprtx(wtpt�bkpt�rXYZ�gXYZ�bXYZ�rTRC gTRC bTRC desc"Artifex Software sRGB ICC Profile"Artifex Software sRGB ICC ProfiletextCopyright Artifex Software 2011XYZ �Q�XYZ XYZ o�8��XYZ b����XYZ $����curv %+28>ELRY`gnu|������������������������������ &/8AKT]gqz������������ +:IXgw��������'7HYj{�������+=Oat������� !-8COZfr~���������� -;HUcq~��������� � % : O d y � � � �� � ' =
Nice, ghostscript is writing it's error to stdout before the start of the JPEG file (the JFIF line). This is covered in T236240 and there's a patch awaiting review in T50007#5594891. I've tested and confirmed that the added parameter to redirect the error to stderr fixes the problem, since the ghostscript error is otherwise non-fatal.
@AntiCompositeNumber This is the traceback
Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/handler/images/images.py", line 577, in _load_results results, content_type = BaseHandler._load_results(self, context) File "/usr/lib/python2.7/dist-packages/thumbor/handlers/__init__.py", line 334, in _load_results results = context.request.engine.read(image_extension, quality) File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/engine/proxy/proxy.py", line 133, in read ret = self.__getattr__('read')(extension, quality) File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/engine/__init__.py", line 40, in read return super(BaseWikimediaEngine, self).read(extension, quality) File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/engine/imagemagick/imagemagick.py", line 323, in read raise ImageMagickException('Failed to convert image %s' % stderr) # pragma: no cover ImageMagickException: Failed to convert image convert: no decode delegate for this image format `' @ error/constitute.c/ReadImage/504. convert: no images defined `jpg:-' @ error/convert.c/ConvertImageCommand/3258.
Other message that's thrown is [ExiftoolRunner] error: 'Warning: Skipped unknown 99 byte header - /srv/thumbor/tmp/thumbor@8801/tmp6FizLj\n'