Maniphest T196961

Non-rendering of thumbnail of compressed pdf in Commons
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Hrishikes
	Jun 12 2018, 1:20 AM

Description

The file https://commons.wikimedia.org/wiki/File:Montazem_Naseri.pdf was uploaded with high compression, size 25.5 MB. The file did not display (jpg preview, thumbnail, next page option -- nowhere). Then I decompressed it to 178.68 MB and overwrote the previous file. Now it displays. The thumbnail of the previous file is still not visible.

Related Objects
Search...

Status	Assigned	Task
Open	None	T43371 Thumbnail/imagescaler (tracking)
Resolved	jijiki	T170817 Upgrade Thumbor servers to Stretch
Resolved	• Gilles	T196961 Non-rendering of thumbnail of compressed pdf in Commons

Event Timeline

Hrishikes created this task.Jun 12 2018, 1:20 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 12 2018, 1:20 AM

Cannot reproduce; the images are displayed for me. Which specific ones are missing? Links welcome. :)

Liuxinyu970226 added a project: Commons.Jun 12 2018, 9:52 AM

The preview is visible now after my overwrite. But please see the upload log. The thumbnail of the original version (compressed pdf) is not visible.

Aklapper added projects: MediaWiki-extensions-PdfHandler, MediaWiki-File-management.Jun 12 2018, 10:02 AM

Restricted Application added a project: Multimedia. · View Herald TranscriptJun 12 2018, 10:02 AM

The first version uploaded wasn't displaying thumbnail pages—choosing whichever you wanted, tried and confirmed at random pages—and this was either directly in the File: at Commons, or when viewed through some of the variations available at the Wikisources.

I have a similar problem with this file: https://commons.wikimedia.org/wiki/File:An_Anglo-Chinese_Vocabulary_of_the_Ningpo_Dialect.pdf
The thumbnail is not displayed even though the file is readable after being downloaded.

• Vvjjkkii renamed this task from Non-rendering of thumbnail of compressed pdf in Commons to 67aaaaaaaa.Jul 1 2018, 1:04 AM

• Vvjjkkii triaged this task as High priority.

• Vvjjkkii added projects: CheckUser, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), Tamil-Sites, Gamepress, Hashtags, Jade, KartoEditor, Language-2018-Apr-June, New-Editor-Experiences, Mail, TCB-Team (now WMDE-TechWish).

• Vvjjkkii updated the task description. (Show Details)

• Vvjjkkii removed a subscriber: Aklapper.

CommunityTechBot renamed this task from 67aaaaaaaa to Non-rendering of thumbnail of compressed pdf in Commons.Jul 2 2018, 2:09 PM

CommunityTechBot raised the priority of this task from High to Needs Triage.

CommunityTechBot updated the task description. (Show Details)

CommunityTechBot removed projects: TCB-Team (now WMDE-TechWish), Mail, New-Editor-Experiences, Language-2018-Apr-June, KartoEditor, Jade, Hashtags, Gamepress, Tamil-Sites, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), CheckUser.

CommunityTechBot added a subscriber: Aklapper.

Is it the same problem with https://commons.wikimedia.org/w/index.php?title=File%3AWilhelm_Gesenius_Hebr%C3%A4ische_Grammatik_(umgearbeitet_von_Emil_Kautzsch).pdf&page=328 ?

I'm pretty sure that the file is a valid PDF because it works on my laptop. It also previewed well at the source site: https://archive.org/details/wilhelmgesenius00gese .

Rachmat04 subscribed.Sep 16 2018, 8:09 AM

Hrishikes mentioned this in T203402: in Commons, some PDFs are failing to render thumbnails..Oct 7 2018, 5:32 PM

Not entirely sure if this is Thumbor territory but as we get some tasks about this (e.g. T203402 might be a dup) could be good to get some attention here.

Aklapper merged a task: T203402: in Commons, some PDFs are failing to render thumbnails..Oct 13 2018, 4:48 PM

Aklapper added subscribers: Ronhjones, • Gilles, Jan.Kamenicek.

I have just experienced the same problem again, see https://commons.wikimedia.org/wiki/File:Horse-radish_culture_in_Bohemia.pdf

In T196961#4663923, @Jan.Kamenicek wrote:

I have just experienced the same problem again, see https://commons.wikimedia.org/wiki/File:Horse-radish_culture_in_Bohemia.pdf

I really no know reason, but that file loads very slowly.

It seems this problem is connected with many (or all?) pdf files downloaded from archive.org, so it would really help if it were solved. It prevents such files to be proofread at Wikisource, as the proofread extension is not able to render the pdf pages from Commons as well.

That horse radish culture PDF is timing out when processed with ghostscript. Which means it's taking more than one minute on our production servers. It's an unreasonable amount of time for any thumbnail. The question is what's special about a 393KB PDF that it would take 1+ minute to extract a thumbnail from it.

On my own machine (MacOS, gs 9.23) it's fast. I've verified on a production Thumbor machine, thumbor1001, that it's excruciatingly slow there (Debian Jessie, gs 9.06). On a Debian Stretch WMCS machine (gs 9.20) it feels fast as well.

This is most likely a ghostscript bug making processing of that kind of file very slow on the version of ghostscript we're stuck with on Debian Jessie. This should be revisited once the Thumbor cluster has been updated to Debian Stretch and a much newer version of ghostscript.

• Gilles added a parent task: T170817: Upgrade Thumbor servers to Stretch.Oct 15 2018, 12:58 PM

I have just experienced the same problem also with a djvu file: https://commons.wikimedia.org/wiki/File:Modernczechpoetr00selvialab.djvu

I uploaded another version of the same djvu file, which is fine. The original problematic version (which worked well in my computer but not in Commons) can be found in history.

Until now I have experienced this problem only with files downloaded from archive.org, now for the first time I have the same problem with a file downloaded from Hathi Trust Digital Library, see https://commons.wikimedia.org/wiki/File:The_voice_of_an_oppressed_people.pdf . Meanwhile, more people were discussing the problem at en.wikisource, e.g. here: https://en.wikisource.org/w/index.php?title=Wikisource%3AScriptorium%2FHelp&type=revision&diff=8890604&oldid=8890506 or here: https://en.wikisource.org/w/index.php?title=Wikisource%3AScriptorium%2FHelp&type=revision&diff=8935250&oldid=8924491 . It would be nice if someone managed to find a solution. Could this task receive a higher priority?

I had a couple of attempts at re-processing the https://commons.wikimedia.org/wiki/File:The_voice_of_an_oppressed_people.pdf file.
(1) Export all pages as tiffs (66) then use Arcobat Pro XI to make a new file - slightly smaller file and works OK
(2) Take the original PDF and "Save as Optimized PDF" in Acrobat (it downsamples the images) - file size now one third, and still all OK and reads fine.
I assume it must be an issue with the PDF agent used to create the original file, and/or the images were scanned a too high a bit rate (each tiff was 4MB in size)

P.S. On the second trial Acrobat did throw a warning of "The PDF document contained image masks that were not downsampled." No idea what that means.

Meanwhile I tried to reprocess it as djvu, which sometimes (not always) helped in the past, but this time it did not: https://commons.wikimedia.org/wiki/File:The_voice_of_an_oppressed_people.djvu (I will nominate if for deletion after some while).

P. S. I am grateful that Ronhjones solved this particular file, but nobody knows how many other users, who are not experienced in both reprocessing the files and complaining at phabricator, are discouraged from downloading PDF books to Commons and Wikisource. Some real solution of the problem is really needed.

It has been found out that the complicated workaround consisting of exporting the files into TIFF and than back to PDF results in flattening the layers and considerable loss of quality of pictures. Not everybody also has software able to do such a workaround. May I ask if there is any progress ahead?

Fixed by the ghostscript update (at least for the image in the task description). Try purging affected files and clearing your browser cache.

Rachmat04 unsubscribed.Feb 15 2019, 2:26 PM

Non-rendering of thumbnail of compressed pdf in CommonsClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Non-rendering of thumbnail of compressed pdf in Commons
Closed, ResolvedPublic
Actions

Related Objects
Search...