The file https://commons.wikimedia.org/wiki/File:Montazem_Naseri.pdf was uploaded with high compression, size 25.5 MB. The file did not display (jpg preview, thumbnail, next page option -- nowhere). Then I decompressed it to 178.68 MB and overwrote the previous file. Now it displays. The thumbnail of the previous file is still not visible.
|Open||None||T43371 Thumbnail/imagescaler (tracking)|
|Resolved||jijiki||T170817 Upgrade Thumbor servers to Stretch|
|Resolved||Gilles||T196961 Non-rendering of thumbnail of compressed pdf in Commons|
The first version uploaded wasn't displaying thumbnail pages—choosing whichever you wanted, tried and confirmed at random pages—and this was either directly in the File: at Commons, or when viewed through some of the variations available at the Wikisources.
I have a similar problem with this file: https://commons.wikimedia.org/wiki/File:An_Anglo-Chinese_Vocabulary_of_the_Ningpo_Dialect.pdf
The thumbnail is not displayed even though the file is readable after being downloaded.
I'm pretty sure that the file is a valid PDF because it works on my laptop. It also previewed well at the source site: https://archive.org/details/wilhelmgesenius00gese .
It seems this problem is connected with many (or all?) pdf files downloaded from archive.org, so it would really help if it were solved. It prevents such files to be proofread at Wikisource, as the proofread extension is not able to render the pdf pages from Commons as well.
That horse radish culture PDF is timing out when processed with ghostscript. Which means it's taking more than one minute on our production servers. It's an unreasonable amount of time for any thumbnail. The question is what's special about a 393KB PDF that it would take 1+ minute to extract a thumbnail from it.
On my own machine (MacOS, gs 9.23) it's fast. I've verified on a production Thumbor machine, thumbor1001, that it's excruciatingly slow there (Debian Jessie, gs 9.06). On a Debian Stretch WMCS machine (gs 9.20) it feels fast as well.
This is most likely a ghostscript bug making processing of that kind of file very slow on the version of ghostscript we're stuck with on Debian Jessie. This should be revisited once the Thumbor cluster has been updated to Debian Stretch and a much newer version of ghostscript.
Until now I have experienced this problem only with files downloaded from archive.org, now for the first time I have the same problem with a file downloaded from Hathi Trust Digital Library, see https://commons.wikimedia.org/wiki/File:The_voice_of_an_oppressed_people.pdf . Meanwhile, more people were discussing the problem at en.wikisource, e.g. here: https://en.wikisource.org/w/index.php?title=Wikisource%3AScriptorium%2FHelp&type=revision&diff=8890604&oldid=8890506 or here: https://en.wikisource.org/w/index.php?title=Wikisource%3AScriptorium%2FHelp&type=revision&diff=8935250&oldid=8924491 . It would be nice if someone managed to find a solution. Could this task receive a higher priority?
I had a couple of attempts at re-processing the https://commons.wikimedia.org/wiki/File:The_voice_of_an_oppressed_people.pdf file.
(1) Export all pages as tiffs (66) then use Arcobat Pro XI to make a new file - slightly smaller file and works OK
(2) Take the original PDF and "Save as Optimized PDF" in Acrobat (it downsamples the images) - file size now one third, and still all OK and reads fine.
I assume it must be an issue with the PDF agent used to create the original file, and/or the images were scanned a too high a bit rate (each tiff was 4MB in size)
Meanwhile I tried to reprocess it as djvu, which sometimes (not always) helped in the past, but this time it did not: https://commons.wikimedia.org/wiki/File:The_voice_of_an_oppressed_people.djvu (I will nominate if for deletion after some while).
P. S. I am grateful that Ronhjones solved this particular file, but nobody knows how many other users, who are not experienced in both reprocessing the files and complaining at phabricator, are discouraged from downloading PDF books to Commons and Wikisource. Some real solution of the problem is really needed.
It has been found out that the complicated workaround consisting of exporting the files into TIFF and than back to PDF results in flattening the layers and considerable loss of quality of pictures. Not everybody also has software able to do such a workaround. May I ask if there is any progress ahead?