Page MenuHomePhabricator

Ghostscript outputs errors to stdout despite -q, preventing Thumbor from generating some thumbnails properly
Closed, ResolvedPublic

Description

I got

Request from 176.207.117.69 via cp3038 frontend, Varnish XID 415308351
Upstream caches: cp3038 int
Error: 429, Too Many Requests at Wed, 16 Oct 2019 14:06:59 GMT

while trying to access other resolutions for https://commons.wikimedia.org/wiki/File:A_Universal_Code_of_Conduct_-_with_input_from_the_CEE_communities.pdf for pages from 7 on, which fail to render in any resolution, but are correctly displayed in the PDF when you download it.

Event Timeline

Elitre created this task.Oct 23 2019, 10:44 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 23 2019, 10:44 AM

This issue is easily replicated requesting pages 7 and 14. It first throws a 500 and then 429.

There do not appear to be any related logs from Mediawiki nor any from Varnish that I could find.

Anecdotally, I've heard there are problems with pdfrenderer, but I see no evidence to show that there is a problem here.

colewhite triaged this task as Medium priority.Oct 23 2019, 7:14 PM

It is indeed unusual for this to apply to specific pages of a small PDF, even moreso for a PDF automatically generated by Google (which means it probably came from a Linux FLOSS software stack to be generated).

Thumbor, which generates those "thumbnails", isn't related to pdfrenderer.

It seems like the ghostscript command used by Thumbor outputs some errors to stdout that end up in the generated JPG, making it invalid and imagemagick is subsequently unable to resize that JPG and turn it into the desired thumbnail.

gs -sDEVICE=jpeg -dJPEG=90 -sOutputFile=%stdout -dFirstPage=7 -r150 -dBATCH -dNOPAUSE -dSAFER -q -fA_Universal_Code_of_Conduct_-_with_input_from_the_CEE_communities.pdf > A_Universal_Code_of_Conduct_-_with_input_from_the_CEE_communities.pdf.jpg

head -50 A_Universal_Code_of_Conduct_-_with_input_from_the_CEE_communities.pdf.jpg
   **** Error: Unknown operator: '0.00-60', processed as number, value: 0.0
                Output may be incorrect.
   **** Error: Unknown operator: '0.00-60', processed as number, value: 0.0
                Output may be incorrect.
   **** Error: Unknown operator: '0.00-60', processed as number, value: 0.0
                Output may be incorrect.
   **** Error: Unknown operator: '0.00-60', processed as number, value: 0.0
                Output may be incorrect.
   **** Error: Unknown operator: '0.00-60', processed as number, value: 0.0
                Output may be incorrect.
????JFIF????

The use of -q means that Ghostscript isn't supposed to write messages to standard output, though. In short, this is ghostscript misbehaving.

Thumbor shouldn't rely on sending the output file to stdout, since -q can't be trusted. In fact it seems like there is precedent to it misbehaving:

https://github.com/wikimedia/operations-software-thumbor-plugins/blob/5a447405f817dbfd033b983daf0d4307b80e0dff/wikimedia_thumbor/engine/ghostscript/ghostscript.py#L37-L39

When generating directly to a file instead of stdout, it generates that JPG fine (and we do see the error/warnings showing up in stdout):

Fixing this should probably fix a lot of other PDFs that were either not getting thumbnails generated for specific pages or getting visually broken ones due to random ghostscript errors being thrown in the middle of the JPG stream.

Gilles renamed this task from Error: 429, Too Many Requests while trying to access other resolutions for a PDF file to Ghostscript outputs errors to stdout despite -q, preventing Thumbor from generating some thumbnails properly.Oct 24 2019, 8:25 AM
Gilles claimed this task.

I will try looking at this in my spare time, but can't promise anything. We need to figure out who's going to maintain Thumbor going forward.

In the meantime, you have all my appreciation.

The use of -q means that Ghostscript isn't supposed to write messages to standard output, though. In short, this is ghostscript misbehaving.

Using gs -sDEVICE=jpeg -dJPEG=90 -sstdout=%stderr -sOutputFile=- -dFirstPage=7 -r150 -dBATCH -dNOPAUSE -dSAFER -q -fA_Universal_Code_of_Conduct_-_with_input_from_the_CEE_communities.pdf > A_Universal_Code_of_Conduct_-_with_input_from_the_CEE_communities.pdf.jpg would fix this problem.

Indeed, nice find! Adding -sstdout=%stderr fixes the issue.

ema moved this task from Triage to Watching on the Traffic board.Oct 30 2019, 2:43 PM

T50007#5594891 is about the same solution and includes a patch by Seb35 for PdfHandler which welcomes review.

Gilles lowered the priority of this task from Medium to Low.Jan 6 2020, 1:16 PM

Change 593358 had a related patch set uploaded (by AntiCompositeNumber; owner: AntiCompositeNumber):
[operations/software/thumbor-plugins@master] engine.ghostscript: use -sstdout=%stderr with gs

https://gerrit.wikimedia.org/r/593358

Change 593358 merged by Gilles:
[operations/software/thumbor-plugins@master] engine.ghostscript: use -sstdout=%stderr with gs

https://gerrit.wikimedia.org/r/593358

Change 595891 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/debs/python-thumbor-wikimedia@master] Upgrade to 2.7

https://gerrit.wikimedia.org/r/595891

Change 595891 merged by Ema:
[operations/debs/python-thumbor-wikimedia@master] Upgrade to 2.7

https://gerrit.wikimedia.org/r/595891

Mentioned in SAL (#wikimedia-operations) [2020-05-12T13:50:55Z] <ema> thumbor2001: upgrade python-thumbor-wikimedia to 2.7-1+deb10u1 T252509 T219569 T236240

Mentioned in SAL (#wikimedia-operations) [2020-05-12T13:54:21Z] <ema> thumbor2001: pool thumbor 2.7-1+deb10u1 for prod traffic T252509 T219569 T236240

Mentioned in SAL (#wikimedia-operations) [2020-05-12T14:00:48Z] <ema> thumbor2001: depool due to minor bug in 2.7-1+deb10u1 T252509 T219569 T236240

Mentioned in SAL (#wikimedia-operations) [2020-05-12T14:33:11Z] <ema> thumbor2001: upgrade python-thumbor-wikimedia to 2.8-1+deb10u1 T252509 T219569 T236240

Mentioned in SAL (#wikimedia-operations) [2020-05-12T14:39:57Z] <ema> rolling thumbor upgrade to 2.8-1+deb10u1 T252509 T219569 T236240

Gilles closed this task as Resolved.Tue, May 12, 2:56 PM

Fix confirmed on https://commons.wikimedia.org/w/index.php?title=File:A_Universal_Code_of_Conduct_-_with_input_from_the_CEE_communities.pdf&page=7

Just purge affected files and they should generate thumbnails fine from now on.

Thank you @AntiCompositeNumber for taking the time to write this bugfix and associated tests!

Thanks all.

4nn1l2 removed a subscriber: 4nn1l2.Tue, May 12, 6:01 PM