Page MenuHomePhabricator

PDF file has 0x0 image size in Commons after uploading a new version while the page number is correct
Open, Needs TriagePublicBUG REPORT

Description

After uploading a new version of this file to Commons:
https://commons.wikimedia.org/wiki/File:PL_Hudson_Jej_naga_stopa.pdf
it has 0x0 image size. (The initial version reported non-zero size at the moment)

Reverting to the previous version fixed the problem just for few minutes, and then its image size was zeroized as well and reappeared again after an hour.

No problem with this file after upload to test Wikipedia:
https://test.wikipedia.org/wiki/File:PL_Hudson_Jej_naga_stopa-test.pdf

Note: this does not seem to be duplicate of T297942 as unlike
https://commons.wikimedia.org/wiki/File:Guinault_-_Sergent_!_(1881).pdf
this file has correct page numbers.

Event Timeline

Page number is properly set, so this problem seems to be different to T298417

Ankry renamed this task from PDF file has 0x0 image size in Commons after uploading a new version to PDF file has 0x0 image size in Commons after uploading a new version while page number is corect.Jan 19 2022, 2:49 PM
Ankry renamed this task from PDF file has 0x0 image size in Commons after uploading a new version while page number is corect to PDF file has 0x0 image size in Commons after uploading a new version while the page number is correct.
Ankry reopened this task as Open.
Ankry updated the task description. (Show Details)

Finally, I tried deleting all versions except the last one, but it didn't work. So I deleted everything, and I reupload the file, and it shows fine. The bug still exist through. And it doesn't show properly on Wikisource.

I'm getting a similar problem with this PDF:

(currently it is broke just on la.ws, as I reverted to an old version.)

I'm getting a similar problem with this PDF:

(currently it is broke just on la.ws, as I reverted to an old version.)

An administrator can try to upload this file locally. This sometimes works as a workaround, but hides the problem.

,snip>
An administrator can try to upload this file locally. This sometimes works as a workaround, but hides the problem.

Are you able to give me some instructions or tips on how this is done so our administrators can assess if it is easy to try or not?

,snip>
An administrator can try to upload this file locally. This sometimes works as a workaround, but hides the problem.

Are you able to give me some instructions or tips on how this is done so our administrators can assess if it is easy to try or not?

Download the file from Commons. Visit Special:Upload on your wiki and upload the file, telling it to continue anyway (I think it will complain it exists on commons)

Just as a general comment... While it's helpful if you report these issues, if you revert these uploads several times, while only linking to the general File: page, it makes it hard for anyone to try and debug the issue, as they can't necessarily find which is the "broken" version of the PDF...

@Reedy Yes I see that. I'm afraid I was trying to brute force solve this problem, but all of the attempts failed. I'll remember to link to file versions in future.

As it goes the current version is still broken on LA WS, and I won't be moving it again as I am hoping an administrator will move a copy onto the wiki as per your instructions.

I went through a similar issue while overwriting this PDF file: https://commons.wikimedia.org/wiki/File:%E8%A8%93%E8%92%99%E5%AD%97%E6%9C%83.pdf
It works fine at Commons, but is shown inconsistent at other wikis.
For example: Wikipedia-en and Wikisource-ko

I've stepped through the logic a bit with some of the reported files, and pretty much the only reason this can happen, is because pdfinfo is not defined/executable/incorrect, or $wgPdfHandlerDpi being 0/undefined.

Both would be silent errors when they occur.

Another option is that the file that boxedcommand creates with the output of the metadata, is not available for the MediaWiki app server for some reason. This too would be a silent error.

D6283's comment indicates that this is still happening. If all pdfinfo work is now a boxedcommand and running in kubernetes, then this indicates that some hosts either don't have pdfinfo, or that there are occasional issues with the output of the command. A race condition with the file flush?

I've stepped through the logic a bit with some of the reported files, and pretty much the only reason this can happen, is because pdfinfo is not defined/executable/incorrect, or $wgPdfHandlerDpi being 0/undefined.

Both would be silent errors when they occur.

This generally happens on large PDF files. Maybe there is a memory/time/other limit that pdfinfo execution exceeds?

@Ankry That would be something that points more to the second possibility. A race condition with the file not yet being available to the process trying to read the metadata.

The file gets uploaded, written to the main file server, then the metadata reading starts and the file is not there at all, or not yet completely written to the location where the metadata is trying to read that file from (a replica server).

Change 1011371 had a related patch set uploaded (by TheDJ; author: TheDJ):

[mediawiki/extensions/PdfHandler@master] Improve logging for Pdf's retrieveMetadata.sh

https://gerrit.wikimedia.org/r/1011371

The patch should make any errors more verbose so that we can collect more information about these failures.

Change 1011371 merged by jenkins-bot:

[mediawiki/extensions/PdfHandler@master] Improve logging for Pdf's retrieveMetadata.sh

https://gerrit.wikimedia.org/r/1011371

&action=purge seems to solve this issue. Can we search for other PDF files that have this issue?