Page MenuHomePhabricator

Thumbnail missing for PDF file (when in PDF/A format?)
Closed, ResolvedPublic

Description

On enwikisource, following pdf file does not render thumbnails:
Sample pages:

https://ko.wikisource.org/w/index.php?title=%ED%8C%8C%EC%9D%BC:%E9%87%8D%E5%88%8A%E8%80%81%E4%B9%9E%E5%A4%A7%E8%AB%BA%E8%A7%A3_001.pdf&page=5

https://en.wikisource.org/wiki/Page:The_New_Testament_of_Iesvs_Christ_faithfvlly_translated_into_English,_ovt_of_the_authentical_Latin,_diligently_conferred_with_the_Greek,_%26_other_Editions_in_diuers_languages.pdf/169
curl https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/The_New_Testament_of_Iesvsrist_faithfvlly_translated_into_English%2C_ovt_of_the_authentical_Latin%2C_diligently_conferred_with_the_Greek%2C_%26_other_Editions_in_diuers_languages.pdf/page169-1024px-thumbnail.pdf.jpg 

429 Too Many Requests

Event Timeline

@Reedy still abnormally slow, start at https://commons.wikimedia.org/w/index.php?title=File%3AThe_New_Testament_of_Iesvs_Christ_faithfvlly_translated_into_English%2C_ovt_of_the_authentical_Latin%2C_diligently_conferred_with_the_Greek%2C_%26_other_Editions_in_diuers_languages.pdf and from the page dropdown on the right, pick any of the pages later in the work.

I have seen the similar issue generally with other pdf files when prodding around at enWS, to me the thumbnail generation for these pdfs based on scans have been playing up for at least a week. [I could dig for filenames if it helps, but it has been wherever I have been working, rather than a particular file.]

MauricioGenta subscribed.

Maybe i found the problem, after trying differents options on the Abby Fine Reader.

https://commons.wikimedia.org/wiki/File:Verahiftoria_Admirandae_Cvivs.pdf

Only the when i uncheck the "Create documents PDF/A" everything works fine.

In a PDF/A, only images without OCR works okey. Common PDF both work okey. PDF/A is a ISO compliant version for archives.

Aklapper renamed this task from Thumbnail missing for PDF file to Thumbnail missing for PDF file (when in PDF/A format?).Jun 19 2018, 9:01 PM

For wikisource:ko:Page:論語諺解 001.pdf/5 and wikisource:ko:重刊老乞大諺解 001.pdf/5 as well, the thumbnail is not loading: Error: 500, Internal Server Error; Error: 429, Too Many Requests. Is this the same issue?

Restricted Application added a subscriber: revi. · View Herald TranscriptOct 8 2019, 5:50 PM
AntiCompositeNumber edited projects, added Upstream; removed SRE-swift-storage.

wikisource:ko:파일:重刊老乞大諺解 001.pdf and c:File:The New Testament of Iesvs Christ faithfvlly translated into English, ovt of the authentical Latin, diligently conferred with the Greek, & other Editions in diuers languages.pdf, and wikisource:ko:페이지:論語諺解 001.pdf/5 appear to be working correctly.

It is expected that the first view of a thumbnail for a specific file, page, and resolution will be slow, as the thumbnail has not been cached yet and must be created.

The original version of https://commons.wikimedia.org/wiki/File:Verahiftoria_Admirandae_Cvivs.pdf works only partially. I've uploaded it to https://test.wikipedia.org/wiki/File:Verahiftoria_Admirandae_Cvivs-r1.pdf to make it easier to see the different pages. Most pages fail to render, outputting only a blank white page. The second page renders without error, making me think that this is not a general PDF/A format issue, but something else about the file.

Ghostscript 9.26 outputs this error on the first page:

$ gs -sstdout=%stderr -sDEVICE=jpeg -dJPEG=90 -sOutputFile=%stdout -dFirstPage=1 -dLastPage=1 -r150 -dBATCH -dNOPAUSE -dSAFER -q -f Verahiftoria_Admirandae_Cvivs-r1.pdf > test.jpg
Can't find CMap Identity-UTF16-H building a CIDDecoding resource. 
Warning: falling back to Identity ordering
Can't find CMap Identity-UTF16-H building a CIDDecoding resource. 
   **** Error: can't process embedded font stream,
        attempting to load the font using its name.
               Output may be incorrect.
Can't find CMap Identity-UTF16-H building a CIDDecoding resource. 
Warning: falling back to Identity ordering
Can't find CMap Identity-UTF16-H building a CIDDecoding resource. 
   **** Error reading a content stream. The page may be incomplete.
               Output may be incorrect.
   **** Error: File did not complete the page properly and may be damaged.
               Output may be incorrect.

No errors are emitted for the second page. Ghostscript 9.52 successfully generates thumbnails for both pages without error. Nothing jumped out at me in the GS changelogs that would obviously fix this problem, so I don't know what specific version <=9.52 is required to make it work.

The problem with the PDF/a has been solved for a while now, I've no idea when...but they started working correctly around 2019. I still get the thumbnail problem from time to time, kinda randomly and almost never on the first page, so difficult to track...
My process for digitalization is always the same: Photo with a Nikon D5300 (JPG) -> Scan Tailor -> AbbyFineReader, didn't find a reason why one pdf fails and others no.

AntiCompositeNumber moved this task from Backlog to Patch merged upstream on the Upstream board.

The PDF is encoded in such a way that the Identity-UTF16-H CMap is required for Ghostscript to process the text. Ghostscript normally provides that file, but in Debian it is provided by the poppler-data package. In Debian stretch, it does not include the Identity-UTF16-H file, but it is available in Debian buster (Debian bug #861363).

If your PDF creation software has a setting to embed fonts, make sure it's turned on.

Jdforrester-WMF subscribed.

Do we consider this fixed ?

I think so. https://commons.wikimedia.org/w/index.php?title=File%3A%E9%87%8D%E5%88%8A%E8%80%81%E4%B9%9E%E5%A4%A7%E8%AB%BA%E8%A7%A3_001.pdf&page=5 certainly now renders a thumbnail.

Please re-open if things are still broken in this way.

LibrErli subscribed.

About 21 hours ago i uploaded the following PDF to Commons for a deWikisource Project: https://commons.wikimedia.org/wiki/File:KB_AR_Rechsteiner_Chronik_Ms_401_mit_Register.pdf up to now, there are no images rendered. maybe this is an issue according to this task. could someone with more insights take a look on this file?
Thanks in advance!

@LibrErli: Please do not change the assignee of a task and please file a new ticket using the bug report form - this has been closed for six months. Thanks a lot! :)