Non-rendering of thumbnail of compressed pdf in Commons
Open, Needs TriagePublic

Description

The file https://commons.wikimedia.org/wiki/File:Montazem_Naseri.pdf was uploaded with high compression, size 25.5 MB. The file did not display (jpg preview, thumbnail, next page option -- nowhere). Then I decompressed it to 178.68 MB and overwrote the previous file. Now it displays. The thumbnail of the previous file is still not visible.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 12 2018, 1:20 AM

Cannot reproduce; the images are displayed for me. Which specific ones are missing? Links welcome. :)

The preview is visible now after my overwrite. But please see the upload log. The thumbnail of the original version (compressed pdf) is not visible.

Restricted Application added a project: Multimedia. · View Herald TranscriptJun 12 2018, 10:02 AM

The first version uploaded wasn't displaying thumbnail pages—choosing whichever you wanted, tried and confirmed at random pages—and this was either directly in the File: at Commons, or when viewed through some of the variations available at the Wikisources.

I have a similar problem with this file: https://commons.wikimedia.org/wiki/File:An_Anglo-Chinese_Vocabulary_of_the_Ningpo_Dialect.pdf
The thumbnail is not displayed even though the file is readable after being downloaded.

Vvjjkkii renamed this task from Non-rendering of thumbnail of compressed pdf in Commons to 67aaaaaaaa.Jul 1 2018, 1:04 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot renamed this task from 67aaaaaaaa to Non-rendering of thumbnail of compressed pdf in Commons.
CommunityTechBot added a subscriber: Aklapper.

Is it the same problem with https://commons.wikimedia.org/w/index.php?title=File%3AWilhelm_Gesenius_Hebr%C3%A4ische_Grammatik_(umgearbeitet_von_Emil_Kautzsch).pdf&page=328 ?

I'm pretty sure that the file is a valid PDF because it works on my laptop. It also previewed well at the source site: https://archive.org/details/wilhelmgesenius00gese .

Not entirely sure if this is Thumbor territory but as we get some tasks about this (e.g. T203402 might be a dup) could be good to get some attention here.

I have just experienced the same problem again, see https://commons.wikimedia.org/wiki/File:Horse-radish_culture_in_Bohemia.pdf

I really no know reason, but that file loads very slowly.

It seems this problem is connected with many (or all?) pdf files downloaded from archive.org, so it would really help if it were solved. It prevents such files to be proofread at Wikisource, as the proofread extension is not able to render the pdf pages from Commons as well.

That horse radish culture PDF is timing out when processed with ghostscript. Which means it's taking more than one minute on our production servers. It's an unreasonable amount of time for any thumbnail. The question is what's special about a 393KB PDF that it would take 1+ minute to extract a thumbnail from it.

On my own machine (MacOS, gs 9.23) it's fast. I've verified on a production Thumbor machine, thumbor1001, that it's excruciatingly slow there (Debian Jessie, gs 9.06). On a Debian Stretch WMCS machine (gs 9.20) it feels fast as well.

This is most likely a ghostscript bug making processing of that kind of file very slow on the version of ghostscript we're stuck with on Debian Jessie. This should be revisited once the Thumbor cluster has been updated to Debian Stretch and a much newer version of ghostscript.

I have just experienced the same problem also with a djvu file: https://commons.wikimedia.org/wiki/File:Modernczechpoetr00selvialab.djvu

I uploaded another version of the same djvu file, which is fine. The original problematic version (which worked well in my computer but not in Commons) can be found in history.