Page MenuHomePhabricator

Thumbnails for specific PDF file rendered half of their proper width on File: page (thumbnail file itself is correct)
Closed, ResolvedPublicBUG REPORT

Description

The file with this issue: https://commons.wikimedia.org/wiki/File:Current_Intelligence_Bulletin_63_-_Occupational_Exposure_to_Titanium_Dioxide.pdf
Now: https://commons.wikimedia.org/wiki/User:TheDJ/T167420

The thumbnails of the first and last pages are distorted—they are squished to half their proper width. The PDF files themselves, and the thumbnails for all the other pages, seem to be fine. (Reported on Commons COM:GL/ILL by John P. Sadowski (NIOSH))

The thumbnails themselves are created correctly, but the dimensions used in filepages do not match the aspect ratio of the dimensions of the page selected for display.


See Also: T72734

Event Timeline

SamanthaNguyen subscribed.

@Perhelion: MediaWiki-extensions-Comments is an extension for registering <comments> as a parser tag and is unrelated to this (Social-Tools also just covers social extensions, which MediaWiki-extensions-Comments falls under). Thus I'm removing both of those and adding the MediaWiki-extensions-PdfHandler project tag.

Aklapper renamed this task from Thumbnails for specific PDF file (on Commons) renders ugly to Thumbnails for specific PDF file (on Commons) rendered half of their proper width.Jun 8 2017, 3:31 PM
Aklapper updated the task description. (Show Details)
Perhelion renamed this task from Thumbnails for specific PDF file (on Commons) rendered half of their proper width to Thumbnails for specific PDF file rendered half of their proper width.Jan 30 2018, 10:39 PM

This isn't an issue with thumbnailing, but a problem with the layout of the file page for PDFs when dealing with PDFs whose page aspect ratio changes from page to page, and is different from what Mediawiki considers to be the "default" aspect ratio for that PDF.

You can see that the thumbnail that appears distorted on the page actually renders with the right aspect ratio when you open it directly: https://upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Current_Intelligence_Bulletin_63_-_Occupational_Exposure_to_Titanium_Dioxide.pdf/page1-463px-Current_Intelligence_Bulletin_63_-_Occupational_Exposure_to_Titanium_Dioxide.pdf.jpg

Mediawiki forces specific width and height on the <img> element on the file page that doesn't match the aspect ratio of those particular pages.

Aklapper renamed this task from Thumbnails for specific PDF file rendered half of their proper width to Thumbnails for specific PDF file rendered half of their proper width on File: page (thumbnail file itself is correct).Apr 8 2019, 11:11 AM
Aklapper edited projects, added MediaWiki-File-management; removed SRE-swift-storage.

What happens here is that the first page and the last page, largely are not individual PDF elements, but its really just one big image. The left part of the image is the last page, the right part of the image is the first page. This image is therefore HALF outside the drawing area of both of those PDF pages.

In this screenshot inside a PDF editor, I have moved the image to be centered on the first page and you can see the content of the last page.

Screenshot 2022-06-14 at 21.51.11.png (1×1 px, 509 KB)

While Ghostscript does seem to 'cut off' those non-visible elements in drawing, it also seems to manipulate the drawing size somehow and it squashes the image ?

PDF info (with the box param) shows that these are the various boxes of the first page
Page 1 MediaBox: 0.00 0.00 1224.00 792.00
Page 1 CropBox: 612.00 0.00 1224.00 792.00
Page 1 BleedBox: 612.00 0.00 1224.00 792.00
Page 1 TrimBox: 612.00 0.00 1224.00 792.00
Page 1 ArtBox: 612.00 0.00 1224.00 792.00

I can confirm that setting any of the following gs options fixes the rendering:
-dUseBleedBox, -dUseTrimBox, -dUseArtBox, -dUseCropBox
https://www.ghostscript.com/doc/current/Use.htm
See also: https://wiki.scribus.net/canvas/PDF_Boxes_:_mediabox,_cropbox,_bleedbox,_trimbox,_artbox

According to https://www.prepressure.com/pdf/basics/page-boxes
The CropBox defines the region that the PDF viewer application is expected to display or print.
I have checked the pdfinfo source code and it too uses CropBox for the page sizes it reports and which we use. We should thus add that option to our ghostscript usage.

Similarly for pdftotext we should specify the -cropbox commandline argument. (Edit: turns out this option is to new, and also only works on the -bbox output mode which we do not use)

Change 805474 had a related patch set uploaded (by TheDJ; author: TheDJ):

[mediawiki/extensions/PdfHandler@master] Use the PDF cropbox for rendering

https://gerrit.wikimedia.org/r/805474

Change 805476 had a related patch set uploaded (by TheDJ; author: TheDJ):

[operations/software/thumbor-plugins@master] Use the PDF cropbox for rendering

https://gerrit.wikimedia.org/r/805476

TheDJ triaged this task as Low priority.
TheDJ edited projects, added Thumbor; removed Multimedia, MediaWiki-File-management.
TheDJ changed the subtype of this task from "Task" to "Bug Report".

Change 805474 merged by jenkins-bot:

[mediawiki/extensions/PdfHandler@master] Use the PDF cropbox for rendering

https://gerrit.wikimedia.org/r/805474

The thumbor fix for this is still pending and the deploy of it likely blocked on T216815

Change 805476 merged by jenkins-bot:

[operations/software/thumbor-plugins@master] Use the PDF cropbox for rendering

https://gerrit.wikimedia.org/r/805476

Looks to me like these pages are now fixed thanks to your fix being pushed out @TheDJ - thank you!