Page MenuHomePhabricator

in Commons, some PDFs are failing to render thumbnails.
Closed, DuplicatePublic

Description

The initial upload of https://commons.wikimedia.org/wiki/File:Philosophical_Transactions_-_Volume_054.pdf fails to render any thumbnails. The file can be downloaded to a PC successfully, and a PDF viewer has no issue in showing the content (tried with Nuance Power PDF and Acrobat Pro). Acrobat reports it's a PDF/A standard document. Removing the /A attribute made no difference. The only solution was to export the 550 pages as separate tiffs and re-make the document - I went for a fairly standard compression, and I did not want to corrupt content (a second lossy compression is never ideal). This does generate a much bigger PDF - obviously the original was highly compressed. This was initially raised on the help desk - https://commons.wikimedia.org/wiki/Commons:Help_desk#Problem_with_a_pdf_file.

Event Timeline

Ronhjones created this task.Sep 3 2018, 3:36 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 3 2018, 3:36 PM
Aklapper changed the task status from Open to Stalled.Sep 3 2018, 3:37 PM

Thumbnails are displayed for me. Which specific page is not working?

Jan.Kamenicek changed the task status from Stalled to Open.EditedSep 3 2018, 4:32 PM
Jan.Kamenicek added a subscriber: Jan.Kamenicek.

The problem is quite well described in the discussion which Ronhjones referred and linked to above. As Ronhjones has written, the problem was with the initially uploaded file, but he remade the document and so the thumbnails got fine. The question is, why the file had to be remade? I had exactly the same problem also with the initial upload of https://commons.wikimedia.org/wiki/File:Czech_Folk_Tales.pdf . Both pdf files were downlodaded from archives.org (e. g. the latter one from https://archive.org/stream/czechfolktales00bauduoft#page/n7 ), both of them seemed well before uploading to Commons, but did not display thumbnails after uploading them. I guess other pdf. files from archive.org would suffer the same problem.

Paladox added a subscriber: Paladox.Sep 3 2018, 4:44 PM

Could this be related to <icinga-wm> PROBLEM - pdfrender on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds ?

Paladox removed a subscriber: Paladox.

@Jan.Kamenicek: Pardon! I obviously hadn't read the initial description closely enough.

One more thing: I thought it might be just a problem of displaying the thumbnails and otherwise the files could work well, so I tried to use them at Wikisource. However, the individual pages of the pdf files did not display there either, neither in the Index namespace nor in the Page namespace. Despite the fact that the scans were not displayed there, the proofreading extension was able to read the text layer of the files!

Maybe a time / space thing? The 550 uncompressed extracted TIFFs took a while to extract and take up a total of 4.17GB of disk space.

MoritzMuehlenhoff triaged this task as Normal priority.Sep 26 2018, 11:51 AM
MoritzMuehlenhoff added a subscriber: Gilles.

This is a duplicate of T196961.