Page MenuHomePhabricator

Thumbnailing page 2 of c:File:Mimořádné opatření - zákaz vývozu desinfekce rukou.pdf generates a non-fatal Ghostscript error that is piped to imagemagick
Closed, DuplicatePublic

Description

23:53 <Urbanecm> Hello everyone, I get constant 429 with https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Mimo%C5%99%C3%A1dn%C3%A9_opat%C5%99en%C3%AD_-_z%C3%A1kaz_v%C3%BDvozu_desinfekce_rukou.pdf/page2-636px-Mimo%C5%99%C3%A1dn%C3%A9_opat%C5%99en%C3%AD_-_z%C3%A1kaz_v%C3%BDvozu_desinfekce_rukou.pdf.jpg, even when I try from my server with a dedicated public IP. What is happening?
00:00 <bd808> Urbanecm: not just you. Probably worth a phab task. It kind of looks to me from the response like thumbor is barfing on fulfilling the request and varnish gave up for a while.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 11 2020, 11:00 PM

Many last PDF by Janbery seems to fail (last page always), cf https://commons.wikimedia.org/wiki/Special:Contributions/Janbery.

Urbanecm updated the task description. (Show Details)Mar 11 2020, 11:12 PM

Hmm seems it's actually two issues... a) something throws 429 on a server-side error b) our thumbnailing logic fails to generate a thumbnail.

https://logstash.wikimedia.org/goto/8516bf501bea410ecd2df2ed2f597044 is my logstash query. This suggests something wrong is inside imagemagick itself.

Running the PDF through GhostScript locally generates an error on the second page:

$ gs -sDEVICE=jpeg -dJPEG=90 -r150 -DBATCH -dNOPAUSE -dSAFER -sOutputFile=Mimořádné_opatření_-_zákaz_vývozu_desinfekce_rukou%d.jpg Mimořádné_opatření_-_zákaz_vývozu_desinfekce_rukou.pdf
GPL Ghostscript 9.50 (2019-10-15)
Copyright (C) 2019 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 2.
Page 1
Page 2
   **** Error: Invalid (0 scaling) text matrix for Tm ****
               Output may be incorrect.

   **** This file had errors that were repaired or ignored.
   **** The file was produced by: 
   **** >>>> Microsoft� Word 2013 (GORDIC PDF Normalizer 4.0.11.30) <<<<
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

   **** The rendered output from this file may be incorrect.

Outputting to a file works fine, but when I try outputting to STDOUT and piping to imagemagick, it can't detect a proper JPEG file.

$ gs -sDEVICE=jpeg -dJPEG=90 -sOutputFile=%stdout -dFirstPage=2 -dLastPage=2 -r150 -dBATCH -dNOPAUSE -dSAFER -q -f Mimořádné_opatření_-_zákaz_vývozu_desinfekce_rukou.pdf | identify -verbose jpeg:-
identify: Not a JPEG file: starts with 0x20 0x20 `/tmp/magick-1378681m3sUwOksSNuy' @ error/jpeg.c/JPEGErrorHandler/342.

I can't see what the logstash entries are, so I don't know if this is the exact same error. Since there is an error in the thumnailer, the 429 behavior is expected. The caching layer will give Thumbor a few tries to generate a failing thumbnail, then respond with 429s for a while.

AntiCompositeNumber added a comment.EditedMar 12 2020, 12:18 AM
$ gs -sDEVICE=jpeg -dJPEG=90 -sOutputFile=%stdout -dFirstPage=2 -dLastPage=2 -r150 -dBATCH -dNOPAUSE -dSAFER -q -f Mimořádné_opatření_-_zákaz_vývozu_desinfekce_rukou.pdf | head
   **** Error: Invalid (0 scaling) text matrix for Tm ****
               Output may be incorrect.
����JFIF����
 ICC_PROFILE
mntrRGB XYZ acspAPPL���-
desc�|cprtx(wtpt�bkpt�rXYZ�gXYZ�bXYZ�rTRC
                                        gTRC
                                           bTRC
                                              desc"Artifex Software sRGB ICC Profile"Artifex Software sRGB ICC ProfiletextCopyright Artifex Software 2011XYZ �Q�XYZ XYZ o�8��XYZ b����XYZ $����curv
%+28>ELRY`gnu|������������������������������
                              &/8AKT]gqz������������
+:IXgw��������'7HYj{�������+=Oat�������             !-8COZfr~���������� -;HUcq~���������
                                    �		%	:	O	d	y	�	�	�	��	�

'
=

Nice, ghostscript is writing it's error to stdout before the start of the JPEG file (the JFIF line). This is covered in T236240 and there's a patch awaiting review in T50007#5594891. I've tested and confirmed that the added parameter to redirect the error to stderr fixes the problem, since the ghostscript error is otherwise non-fatal.

AntiCompositeNumber renamed this task from Getting constant 429 for a thumbnail to Thumbnailing page 2 of c:File:Mimořádné opatření - zákaz vývozu desinfekce rukou.pdf generates a non-fatal Ghostscript error that is piped to imagemagick.Mar 12 2020, 12:23 AM

@AntiCompositeNumber This is the traceback

	Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/handler/images/images.py", line 577, in _load_results
    results, content_type = BaseHandler._load_results(self, context)
  File "/usr/lib/python2.7/dist-packages/thumbor/handlers/__init__.py", line 334, in _load_results
    results = context.request.engine.read(image_extension, quality)
  File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/engine/proxy/proxy.py", line 133, in read
    ret = self.__getattr__('read')(extension, quality)
  File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/engine/__init__.py", line 40, in read
    return super(BaseWikimediaEngine, self).read(extension, quality)
  File "/usr/lib/python2.7/dist-packages/wikimedia_thumbor/engine/imagemagick/imagemagick.py", line 323, in read
    raise ImageMagickException('Failed to convert image %s' % stderr)  # pragma: no cover
ImageMagickException: Failed to convert image convert: no decode delegate for this image format `' @ error/constitute.c/ReadImage/504.
convert: no images defined `jpg:-' @ error/convert.c/ConvertImageCommand/3258.

Other message that's thrown is [ExiftoolRunner] error: 'Warning: Skipped unknown 99 byte header - /srv/thumbor/tmp/thumbor@8801/tmp6FizLj\n'

MoritzMuehlenhoff triaged this task as Medium priority.Apr 6 2020, 2:27 PM