Page MenuHomePhabricator

Provide better PDF viewer based on pdf.js project
Open, LowPublicFeature

Description

Author: tomerc+bugzilla.wikimedia.org

Description:
pdf.js is PDF viewer implemented in JavaScript that is included by default in Mozilla Firefox (and Chromium, as far as I know).

The viewer can be embedded in any web page, and I suggest replacing the default PDF viewer with it, as it doesn't require additional processing on the server in order to convert PDF pages to images, and provide a user interface similar to desktop PDF viewers.

Demo: http://mozilla.github.io/pdf.js/web/viewer.html

Source code: https://github.com/mozilla/pdf.js/


Version: unspecified
Severity: enhancement

Details

Reference
bz52881

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:47 AM
bzimport set Reference to bz52881.
bzimport added a subscriber: Unknown Object (MLST).

pdf.js might be a solution to some problem that I don't see yet.
Could you elaborate, please?

What do you mean by the default pdf viewer?

WMF wikis have http://www.mediawiki.org/wiki/Extension:PdfHandler which vastly improves default MediaWiki handling of PDF files

tomerc+bugzilla.wikimedia.org wrote:

(In reply to comment #2)

What do you mean by the default pdf viewer?

WMF wikis have http://www.mediawiki.org/wiki/Extension:PdfHandler which
vastly
improves default MediaWiki handling of PDF files

The extension above convert PDF files to set of images, while each page in the document is represented by the page image, and linked to the page before and after, and each page require loading the webpage over and over.

By using a desktop PDF viewer (or a browser plugin) the user is seeing the whole document, making it easier to navigate between pages. pdf.js try to do the same in the browser, so users will get better user interface while reading PDF documents online.

While the current implementation require some server side processing, pdf.js does load the original PDF file and show it in the browser canvas, so implementing it won't require additional changes on the server and can work side-by-side with the current PDF extension which I feel most users dislike.

Personal opinion: this might be good as a "both" type solution. PdfHandler for the current situation that some like (and can be incrementally improved upon, UI-wise) in addition to a "view full PDF in-browser" or somesuch.

Really, since I run Fx, it's all the same to me. When I click on a .pdf "the right thing" happens (pdf.js loads it in-browser).

This is a bad idea. pdf.js takes ages to load (it's simply a lot of code) and even longer to actually render anything (even optimized JS is not fast enough).

Not everybody in the world has access to the same technology, and what is fine in San Francisco might not be appropriate in Eastern Europe or Africa.

(Although I admit, once it finally loaded after a few minutes I was pleasantly surprised by its responsivity in the demo you linked.)

tomerc+bugzilla.wikimedia.org wrote:

(In reply to comment #5)

This is a bad idea. pdf.js takes ages to load (it's simply a lot of code) and
even longer to actually render anything (even optimized JS is not fast
enough).

I'm not sure what you're talking about. Here it loads quite well, and note that if Wikipedia will have better caching directives than the limited caching available on Github - you can get it loads its own resources from the local browser cache so only the first document will be delayed because of downloading the viewer code.

Not everybody in the world has access to the same technology, and what is
fine
in San Francisco might not be appropriate in Eastern Europe or Africa.

Only in case they are having an old and outdated browsers, or a very limited Internet access. Also note that pdf.js should work well on mobiles, which doesn't always have native PDF viewer installed, making reading PDF documents very challenging (and Google's online PDF viewer, which they link from their own applications sucks).

(Although I admit, once it finally loaded after a few minutes I was
pleasantly
surprised by its responsivity in the demo you linked.)

Please also remember that since it is HTML-based, it should be possible to manipulate with the viewer user interface, making its look and feel be in the same theme as Mediawiki.

(In reply to comment #4)

Personal opinion: this might be good as a "both" type solution. PdfHandler
for
the current situation that some like (and can be incrementally improved upon,
UI-wise) in addition to a "view full PDF in-browser" or somesuch.

Really, since I run Fx, it's all the same to me. When I click on a .pdf "the
right thing" happens (pdf.js loads it in-browser).

Demo of current PDF extension implementation over Wikipedia: https://commons.wikimedia.org/w/index.php?title=File%3AMultilingual-commons.pdf

The download link is located below the page image and labeled "Full resolution" instead of something more meaningful.

Steps to reproduce:
a. Click on the following link: https://commons.wikimedia.org/w/index.php?title=File%3AMultilingual-commons.pdf
b. Try to read the few slides in the presentation.

Actual result:
Note that you have to load another page in order to read the next slide, which contain only few words each.

Expected result:
Read the whole slides content without page reloads, see it in a full screen/presentation mode for best results.

Try to open the PDF file in Firefox. Make sure that Firefox is set to preview PDF files in the browser (Options/Preferences → Applications → Search: PDF → Preview in Firefox).

(In reply to comment #6)

(In reply to comment #5)

This is a bad idea. pdf.js takes ages to load (it's simply a lot of code) and
even longer to actually render anything (even optimized JS is not fast
enough).

I'm not sure what you're talking about. Here it loads quite well, and note
that
if Wikipedia will have better caching directives than the limited caching
available on Github - you can get it loads its own resources from the local
browser cache so only the first document will be delayed because of
downloading
the viewer code.

Define "here"? I live in Poland and use a 2005 laptop on a 1 mbps connection. The performance of the demo was not awful, but not smooth either, and I'm certainly not at the very end of a spectrum.

Demo of current PDF extension implementation over Wikipedia:
https://commons.wikimedia.org/w/index.php?title=File%3AMultilingual-commons.
pdf

The download link is located below the page image and labeled "Full
resolution"
instead of something more meaningful.

Yeah, good point. I filed separate bug 53017 about this.

Steps to reproduce:
a. Click on the following link:
https://commons.wikimedia.org/w/index.php?title=File%3AMultilingual-commons.
pdf
b. Try to read the few slides in the presentation.

Actual result:
Note that you have to load another page in order to read the next slide,
which
contain only few words each.

Expected result:
Read the whole slides content without page reloads, see it in a full
screen/presentation mode for best results.

This has actually just been fixed (and will be deployed soon). Page reload is no longer necessary, but the interface isn't perfect yet. See bug 40207.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:13 AM