Page MenuHomePhabricator

PDF download generates invalid PDF files
Closed, DuplicatePublicBUG REPORT

Description

Steps to Reproduce:

  1. Go to https://en.wikipedia.org/wiki/Rail_transport_modelling
  2. Press "Download as PDF" in the left menu
  3. Press "Download"

Note: I am able to reproduce this with Lewis_Hamilton at enwiki as well, as well as Funkcionalismus at cswiki. It doesn't seem to be a single wiki/page issue.

Actual Results:

A PDF file is downloaded to your computer, but opening it via Firefox's internal viewer, Chrome's internal viewer, or Adobe Acrobat tells me it is invalid.

Expected Results:

The Download as PDF feature gives me a valid PDF that contains the article itself.

Note:

This may or may not be related to T266373: Connection closed while downloading PDF of articles, which is said to be relevant at wikis with Desktop Improvements project installed. Given the files differ in size:

image.png (165×568 px, 12 KB)

it might be a symptom of the task mentioned?

Attachments:

Example invalid PDFs:

Event Timeline

I couldn't reproduce this with Chrome 86. Also couldn't with Mac's Preview. I am not sure whether this might be caused by Chrome's upstream change (need to know the versions you used though). I think there was once such corruption and it was traced to upstream Chromium change.

Screen Shot 2020-10-27 at 4.59.18 PM.png (677×1 px, 378 KB)

@Ammarpad Thansk for your comment. I wasn't able to open any of those PDFs with any PDF viewer installed (all browsers I mentioned, plus Adobe Acrobat).

Weirdly enough, I'm able to download PDFs without any problems when I connect via mwdebug1001. That might indicate it's in fact not an issue inside Proton, but a Traffic issue? Given the PDFs have variable length, I think something doesn't send the full PDF (so the PDF is truncated), and hence not openable in any PDF viewer. Adding Traffic.

@Hrishikes couldn't reproduce the problem and downloaded the normal pdf. I am sharing the file here as per his request.

And here is the Czech Wiki file.

I am not getting invalid pdfs.

LGoto triaged this task as Low priority.Nov 4 2020, 4:42 PM

The Physics page I downloaded yesterday (and today) was corrupted as well. When I I updated chrome I managed to get an uncorrupted version but the font looks weird.

Physics.png (747×1 px, 434 KB)
In fact all the pages I downloaded yesterday that weren't corrupted have this weird font... (and headers and footers are missing).
Also even though I just updated chrome, pages that were edited recently (give or take 1 week), still get corrupted* (like Set (mathematics) for example (which was edited last yesterday)). I tried Fire-fox as well but there's an error saying: "C:\Users\*\AppData\Local\Temp\gq2sovrf.pdf.part could not be saved, because the source file could not be read").

*Out of the 5-7 pages I downloaded yesterday that were edited within the last few days only one of them wasn't corrupted.

For the record, I just answered an user report of this issue sent to OTRS.

@LGoto I disagree with this being "Low" priority, as there are several users confirming this in this very task, and due to the user report. Can you please explain it, or make it higher? Thanks!

@LGoto After some debugging on T266373 it looks like the issue is more complicated and affects more users than we assessed. The reason we prioritized this as low was the fact that the metrics didn't show some sort of service outage and I thought it was a once off issue.

@sdkim Should i bump this to high priority? According to https://phabricator.wikimedia.org/T266373#6608804 there is a big amount of user facing traffic that gets affected.