Page MenuHomePhabricator

Thumbnails for PDF files not found when width is not an Integer in URL link
Closed, ResolvedPublic

Description

See https://commons.wikimedia.org/wiki/File:Annie_Besant_-_The_Story_of_Afghanistan.pdf

Click link for 677 × 1,031 pixels in:
Size of this JPG preview of this PDF file: 393 × 599 pixels. Other resolutions: 157 × 240 pixels | 315 × 480 pixels | 677 × 1,031 pixels.
(the correponding Url is https://upload.wikimedia.org/wikipedia/commons/thumbhttps://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/Annie_Besant_-_The_Story_of_Afghanistan.pdf/page1-677.08333333333px-Annie_Besant_-_The_Story_of_Afghanistan.pdf.jpg

Result:
404: Not Found

It is also weird that page size is represented as float.

It has been changed recently by this commit:
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/PdfHandler/+/560127/2/includes/PdfImage.php
with no comments.

Has it been done intentionally?

This creates problems, e.g. see https://en.wikisource.org/wiki/Page:Annie_Besant_-_The_Story_of_Afghanistan.pdf/15, where the image thumb url points to a not existing page (https://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/Annie_Besant_-_The_Story_of_Afghanistan.pdf/page15-677.08333333333px-Annie_Besant_-_The_Story_of_Afghanistan.pdf.jpg).

Details

Related Gerrit Patches:
mediawiki/extensions/PdfHandler : masterFix PDF image width / height
mediawiki/extensions/PdfHandler : masterRevert explict casts and use implict casts as before

Event Timeline

Mpaa created this task.Jan 11 2020, 5:18 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 11 2020, 5:18 PM
Mpaa added a subscriber: Legoktm.EditedJan 11 2020, 5:20 PM

@Legoktm, @Umherirrender could you please take a look at this commit?
Is it intentional that the computation has been changed?
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/PdfHandler/+/560127/2/includes/PdfImage.php

Mpaa updated the task description. (Show Details)Jan 11 2020, 5:24 PM
Mpaa updated the task description. (Show Details)Jan 11 2020, 5:29 PM
Mpaa updated the task description. (Show Details)Jan 11 2020, 5:36 PM

Change 563702 had a related patch set uploaded (by Mpaa; owner: Mpaa):
[mediawiki/extensions/PdfHandler@master] Fix PDF image width / height

https://gerrit.wikimedia.org/r/563702

Xover added a comment.Jan 12 2020, 7:59 AM

@Mpaa By my assessment it cannot have been done deliberately, as the modified code will never work in the context it is evidently being used. When combined with the misleading commit message referencing Phan, my guess is that this was an attempt to fix a warning generated by the static analyzer that just failed to spot the need for armoring there.

Change 563705 had a related patch set uploaded (by Umherirrender; owner: Umherirrender):
[mediawiki/extensions/PdfHandler@master] Revert explict casts and use implict casts as before

https://gerrit.wikimedia.org/r/563705

I have reverted the cast changes on phan upgrade and used a phan config to make phan still happy on that extension.

That code now depends on the implict casts as before.

Image width and height are always integer, but it seems that some files are now uploaded with float values

I have no idea if the the url should be fixed to get a integer or if the float is okay.
But with the revert there is time to discuss or to fix all places as core and some extensions showing the float

Change 563705 merged by jenkins-bot:
[mediawiki/extensions/PdfHandler@master] Revert explict casts and use implict casts as before

https://gerrit.wikimedia.org/r/563705

Change 563702 abandoned by Mpaa:
Fix PDF image width / height

Reason:
See I60adf4aa64586d457a32cb220b1fcd7518d32a5e

https://gerrit.wikimedia.org/r/563702

Mpaa added a comment.EditedJan 12 2020, 9:57 AM

Image width and height are always integer, but it seems that some files are now uploaded with float values

I think width and height are evaluated at run time (possibly cached), so I assume when this will be deployed, things will get back to normal.

Image width and height are always integer, but it seems that some files are now uploaded with float values

I think width and height are evaluated at run time (possibly cached), so I assume when this will be deployed, things will get back to normal.
When will this be deployed?

I am not familar with the process of hot deployment - some information are at https://wikitech.wikimedia.org/wiki/SWAT_deploys
Hopefully at monday

The revert/fix is available on beta wiki and could be tested there if that is enough to fix the bug - https://commons.wikimedia.beta.wmflabs.org/wiki/Main_Page

@Umherirrender Many thanks!

This should be possible to test by downloading https://commons.wikimedia.org/wiki/File:Sand_-_Œuvres_illustrées_de_George_Sand,_vol_9,_1856.pdf from production (note the non-integer dimensions on the info page, and the 404 link for the largest thumbnail) and uploading it to the beta wiki. If the image dimensions show as integers there, and none of the thumbnail links are broken, then this issue is almost certainly resolved (as best I can tell, anyway). I don't have an account on the beta cluster and didn't want to start fiddling with that just now. I don't think we have a functioning Wikisource setup on the beta cluster (we should probably remedy that at some point), so I don't think we can meaningfully test that this feeds ProofreadPage what it needs, but it seems very likely to me.

Incidentally if I'm correct that the planned deploy window for 1.35.0-wmf.15 is Tuesday–Thursday (the Wikisourcen getting it on Wednesday) then SWAT is probably not needed. My guesstimate is that this affects 1–2% (manual checking of a very very small sample size) of all PDF files so it doesn't shut down all the Wikisourcen, it just hampers work on some individual books/works. Annoying enough for those affected, and it would be nice to get it fixed "yesterday", but with weekly regular deploys it's probably better to save SWAT for when the natural organic fertiliser really intersects the rotary gaseous medium agitator.

Looks good to me!

Mpaa closed this task as Resolved.Jan 15 2020, 9:22 PM
Mpaa claimed this task.

Seems OK to me.