Page MenuHomePhabricator

HTTP 500 for thumbnails of damaged PDF file File:Mueller_letter_to_Barr_2019-03-27.pdf
Closed, DuplicatePublicBUG REPORT

Description

All thumbnail images of File:Mueller_letter_to_Barr_2019-03-27.pdf are served with HTTP 500 or 429 Too Many Requests. The file itself is a PDF document, version 1.3 without obvious anomalies.

Specifically, all links //upload.wikimedia.org/wikipedia/commons/thumb/4/47/Mueller_letter_to_Barr_2019-03-27.pdf/pageₖ-ₙₙₙpx-Mueller_letter_to_Barr_2019-03-27.pdf.¤¤g fail for whichever ₖ = 1,2, various ₙₙₙ, and ¤¤g (format) either png or jpg. Due to error hiding, not much information is available for me.

Such links are expected to serve thumbnails in image/png or image/jpeg depending on the extension.

Event Timeline

Well, that PDF file is damaged. Not much that PdfHandler can do about it I'd say.

$:acko\> qpdf --check Mueller_letter_to_Barr_2019-03-27.pdf 
WARNING: Mueller_letter_to_Barr_2019-03-27.pdf: reported number of objects (10) inconsistent with actual number of objects (11)
WARNING: Mueller_letter_to_Barr_2019-03-27.pdf: file is damaged
WARNING: Mueller_letter_to_Barr_2019-03-27.pdf (object 8 0, offset 535780): expected 8 0 obj
WARNING: Mueller_letter_to_Barr_2019-03-27.pdf: Attempting to reconstruct cross-reference table
checking Mueller_letter_to_Barr_2019-03-27.pdf
PDF Version: 1.3
File is not encrypted
WARNING: Mueller_letter_to_Barr_2019-03-27.pdf (object 1 0, offset 176): stream keyword followed by carriage return only
File is not linearized
WARNING: Mueller_letter_to_Barr_2019-03-27.pdf (object 2 0, offset 427577): stream keyword followed by carriage return only
WARNING: Mueller_letter_to_Barr_2019-03-27.pdf (object 5 0, offset 427995): stream keyword followed by carriage return only
WARNING: Mueller_letter_to_Barr_2019-03-27.pdf (object 6 0, offset 535716): stream keyword followed by carriage return only

Thanks for a prompt reply, but the way PdfHandler treats this error is disrespectful. Can a sensible error message be generated, please?

Thanks for a prompt reply, but the way PdfHandler treats this error is disrespectful.

No, because https://commons.wikimedia.org/wiki/File:Mueller_letter_to_Barr_2019-03-27.pdf already says:
This PDF file is broken or corrupt. Description: File does not render when displayed as an image.
What else do you expect?

I manually added that message about the file being "broken or corrupt" when I noticed the issue, but there is nothing wrong with the file when viewing it traditionally.
Ideally, PdfHandler would be able to gracefully fail and at least show some information in the thumbnail rather than simply breaking. Why is PdfHandler less capable than any browser's built-in PDF viewer? See how it loads fine at https://upload.wikimedia.org/wikipedia/commons/4/47/Mueller_letter_to_Barr_2019-03-27.pdf?

I've tried to upload a new version of the file but the software will not allow it because the file already exists. See some additional (prior) discussion on the file here.

That makes sense but is another topic.
Feel free to file a separate "When a thumbnail is not created due to PDF file corruption, display an error message instead of silently failing" task or such.

On a somewhat related note, please add Commons as a "tags" section and not "subscribers". Not everyone becomes member of "Commons" project to receive a bugmail on every ticket. Thank you!

That makes sense but is another topic.
Feel free to file a separate "When a thumbnail is not created due to PDF file corruption, display an error message instead of silently failing" task or such.

It seems there are several issues (-> and multiple potential solutions, hopefully) here:

  1. Silent failure when thumbnail is not created due to PDF file corruption -> Display an error message instead of silently failing
  2. Minor PDF file corruption (note how the file renders without any issues by the browser's built-in PDF viewer) results in broken thumbnails -> Render thumbnails based on available information but default to the error message in "1." above when the corruption is too great
  3. Software will not allow re-upload if the file already exists, even if the file is broken -> Allow overwriting of files when there is a rendering problem

Would all of these require (and/or qualify for) separate tasks here? (Sorry for my inexperience, this is my first exposure to phabricator.)

$ qpdf --check Mueller_letter_to_Barr_2019-03-27.pdf
checking Mueller_letter_to_Barr_2019-03-27.pdf
PDF Version: 1.3
File is not encrypted
File is not linearized
No syntax or stream encoding errors found; the file may still contain
errors that qpdf cannot detect

Now, qpdf does not detect errors, but thumbnails are still not generated.

OK, after some delay the file works.

The fix was"

$ qpdf Mueller_letter_to_Barr_2019-03-27.pdf fixed.pdf

and upload fixed.pdf as a new version
Thanks to @Aklapper for the hint.

Aklapper renamed this task from HTTP 500 for thumbnails of File:Mueller_letter_to_Barr_2019-03-27.pdf to HTTP 500 for thumbnails of damaged PDF file File:Mueller_letter_to_Barr_2019-03-27.pdf.Sep 10 2019, 9:36 AM

Why is PdfHandler less capable than any browser's built-in PDF viewer?

Ghostscript (which your browser may also be using) is completely capable of opening the file. However, it prints both the image and a non-fatal error message to STDOUT, corrupting the image. That causes imagemagick to throw a fatal error and we get no thumbnail. This isn't a PdfHandler issue, it's a Thumbor issue.