PDF download generates invalid PDF files
Closed, DuplicatePublicBUG REPORT
Actions

Assigned To

Authored By

	Urbanecm
	Oct 27 2020, 12:59 PM

Description

Steps to Reproduce:

Go to https://en.wikipedia.org/wiki/Rail_transport_modelling
Press "Download as PDF" in the left menu
Press "Download"

Note: I am able to reproduce this with Lewis_Hamilton at enwiki as well, as well as Funkcionalismus at cswiki. It doesn't seem to be a single wiki/page issue.

Actual Results:

A PDF file is downloaded to your computer, but opening it via Firefox's internal viewer, Chrome's internal viewer, or Adobe Acrobat tells me it is invalid.

Expected Results:

The Download as PDF feature gives me a valid PDF that contains the article itself.

Note:

This may or may not be related to T266373: Connection closed while downloading PDF of articles, which is said to be relevant at wikis with Desktop Improvements project installed. Given the files differ in size:

it might be a symptom of the task mentioned?

Attachments:

Example invalid PDFs:

Rail_transport_modelling (2).pdf438 KBDownload

Funkcionalismus.pdf645 KBDownload

Rail_transport_modelling (1).pdf561 KBDownload

Lewis_Hamilton (1).pdf572 KBDownload

Rail_transport_modelling.pdf803 KBDownload

Lewis_Hamilton.pdf1 MBDownload

Related Objects

Mentioned In: T265217: Regression: New Vector adds all sidebar when exporting to PDF
Mentioned Here: T266373: Connection closed while downloading PDF of articles

Event Timeline

Urbanecm created this task.Oct 27 2020, 12:59 PM

Restricted Application added a project: Product-Infrastructure-Team-Backlog-Deprecated. · View Herald TranscriptOct 27 2020, 12:59 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Jdlrobson mentioned this in T265217: Regression: New Vector adds all sidebar when exporting to PDF.Oct 27 2020, 3:14 PM

I couldn't reproduce this with Chrome 86. Also couldn't with Mac's Preview. I am not sure whether this might be caused by Chrome's upstream change (need to know the versions you used though). I think there was once such corruption and it was traced to upstream Chromium change.

Screen Shot 2020-10-27 at 4.59.18 PM.png (677×1 px, 378 KB)

Ammarpad added a project: Browser-support-print-media.Oct 27 2020, 4:11 PM

@Ammarpad Thansk for your comment. I wasn't able to open any of those PDFs with any PDF viewer installed (all browsers I mentioned, plus Adobe Acrobat).

Weirdly enough, I'm able to download PDFs without any problems when I connect via mwdebug1001. That might indicate it's in fact not an issue inside Proton, but a Traffic issue? Given the PDFs have variable length, I think something doesn't send the full PDF (so the PDF is truncated), and hence not openable in any PDF viewer. Adding Traffic.

Restricted Application added a project: SRE. · View Herald TranscriptOct 27 2020, 4:51 PM

I get ERR_CONNECTION_CLOSED on https://en.wikipedia.org/api/rest_v1/page/pdf/Rail_transport_modelling

Chrome on iOS

LGoto assigned this task to Jgiannelos.Oct 28 2020, 3:39 PM

LGoto moved this task from Needs triage to Needs investigation on the Product-Infrastructure-Team-Backlog-Deprecated board.

@Hrishikes couldn't reproduce the problem and downloaded the normal pdf. I am sharing the file here as per his request.

Rail_transport_modelling.pdf1 MBDownload

And here is the Czech Wiki file.

Funkcionalismus.pdf732 KBDownload

I am not getting invalid pdfs.

Lewis_Hamilton.pdf1 MBDownload

LGoto triaged this task as Low priority.Nov 4 2020, 4:42 PM

The Physics page I downloaded yesterday (and today) was corrupted as well. When I I updated chrome I managed to get an uncorrupted version but the font looks weird.

In fact all the pages I downloaded yesterday that weren't corrupted have this weird font... (and headers and footers are missing).
Also even though I just updated chrome, pages that were edited recently (give or take 1 week), still get corrupted* (like Set (mathematics) for example (which was edited last yesterday)). I tried Fire-fox as well but there's an error saying: "C:\Users\*\AppData\Local\Temp\gq2sovrf.pdf.part could not be saved, because the source file could not be read").

*Out of the 5-7 pages I downloaded yesterday that were edited within the last few days only one of them wasn't corrupted.

For the record, I just answered an user report of this issue sent to OTRS.

@LGoto I disagree with this being "Low" priority, as there are several users confirming this in this very task, and due to the user report. Can you please explain it, or make it higher? Thanks!

@LGoto After some debugging on T266373 it looks like the issue is more complicated and affects more users than we assessed. The reason we prioritized this as low was the fact that the metrics didn't show some sort of service outage and I thought it was a once off issue.

@sdkim Should i bump this to high priority? According to https://phabricator.wikimedia.org/T266373#6608804 there is a big amount of user facing traffic that gets affected.

Urbanecm closed this task as a duplicate of T266373: Connection closed while downloading PDF of articles.Nov 10 2020, 8:55 PM

	F32416575: Screen Shot 2020-10-27 at 4.59.18 PM.png
	Oct 27 2020, 4:09 PM

	F32416474: Rail_transport_modelling (1).pdf
	Oct 27 2020, 12:59 PM

	F32416476: Rail_transport_modelling.pdf
	Oct 27 2020, 12:59 PM

	F32431379: Physics.png
	Nov 4 2020, 6:13 PM

	F32421295: Lewis_Hamilton.pdf
	Nov 1 2020, 1:45 PM

	F32421293: Funkcionalismus.pdf
	Nov 1 2020, 1:43 PM

PDF download generates invalid PDF filesClosed, DuplicatePublicBUG REPORTActions

Description

Related Objects

Event Timeline

PDF download generates invalid PDF files
Closed, DuplicatePublicBUG REPORT
Actions