Page MenuHomePhabricator

PDF and Djvu files on Commons failed to be processed (no thumbnails, zero pages) but otherwise valid
Open, Needs TriagePublic


I went through Wikimedia Commons dump and checked for all invalid PDF and Djvu files (those with no thumbnails, 0x0 size, and zero pages) and tested them. Those which were really invalid I tried to replace with a fixed version, or if I could not find a fixed version, I marked them for speedy deletion.

But I have found some files which look invalid on Commons which seems to load fine (at least in Firefox for PDF, and ddjvu for Djvu files). Maybe there is some issue with how they are processed on the backend?

Here is the list: (processing of thumbnails started, but then it died)文選樓叢書_疇人傳:卷十二.djvu清代学术丛书·第一集·颜氏学记:卷七至卷八.djvuКирилова_книга_часть_8.djvuРусский_биографический_словарь._Том_15_(1910)_—_с._24-25.djvuТомские_губернские_ведомости,_1900_№_38_(28_сентября).djvuУказатель_статей_морского_сборника_1848_-_1872_г._1875(2).djvu

See also (and possibly duplicate with): T297942, T298417, T299521

Event Timeline

What is this Why change links to that?

So this list is exhaustive. I went through all PDFs and Djvu files on Wikimedia Commons as of previous week. Not just a random example. if we fix these, then all of them will be fixed. :-)

No, this one seems just a slightly broken PDF. I just fixed it.

that's odd, I saved the pdf file starting from a Word document. (Ok, at a second thought that's not odd at all :-) ) Thanks!

So I fixed it using mutool clean. But the ones I listed above cannot be fixed this way. And this is what I am reporting. So mutool clean does not fix it, looking at MediaBox values show reasonable page sizes (including the first page), and even metadata (example for the first file above shows page size available:

    "name": "pdf-PageSize",
    "value": [
            "name": 0,
            "value": "612 x 792 pts (letter)"
            "name": 1,
            "value": "697 x 855 pts"

But Mediawiki does not show width and height. So something is wrong.

@mau If you made this PDF yourself, could I recommend removing the first blank page? Because otherwise the first thumbnail does not show anything.

@Mitar probably it's even better to substitute the first page with the actual cover for the book, indeed. I proceed :-)

Mitar updated the task description. (Show Details)
Mitar updated the task description. (Show Details)

I ran into the same problem. I don't know if this can be considered a solution, because these steps have to be done on the server side, but I solved my problem:

  1. step – repair thumbnails for files of the core MediaWiki
php maintenance/refreshImageMetadata.php --verbose --mime image/vnd.djvu --force
  1. step – do null edit of the index pages by Extension:Proofread_Page (need for actualization info about the pages count for special page)
php maintenance/refreshLinks.php --namespace 252