Page MenuHomePhabricator

ProofreadPage does not use image's full resolution when zooming in
Open, MediumPublic

Description

Author: mcdevitd

Description:
When using the zoom function in ProofreadPage, it is apparent that it is not actually zooming in to the image's full resolution. The image will start to become blurred and unreadable in the ProofreadPage side-by-side view before it is at actual size. Compare http://en.wikisource.org/w/index.php?title=Page:Historical_Sketch_of_the_Rebellion._Published_at_the_Office_of_the_U.S._Coast_Survey,_A._D._Bache,_-_NARA_-_305799.jpg&action=edit when zoomed in a few times with the image at full size: http://upload.wikimedia.org/wikipedia/commons/0/05/Historical_Sketch_of_the_Rebellion._Published_at_the_Office_of_the_U.S._Coast_Survey%2C_A._D._Bache%2C_-_NARA_-_305799.jpg.

ProofreadPage should not be using lower-resolution images since its purpose is to allow users to transcribe often difficult texts, where zooming in on a part of the image at the fullest resolution is often necessary.


Version: unspecified
Severity: minor

Details

Reference
bz41614

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:01 AM
bzimport added a project: ProofreadPage.
bzimport set Reference to bz41614.
bzimport added a subscriber: Unknown Object (MLST).

I think this was fixed with the Proofread page refactoring, but I am not sure. Can anyone confirm?

Hmm. As I recall, PRP uses a hard 1024px size for the "thumbnail" it requests. I am assuming this was a value picked as a sort of compromise between full fidelity to the user and various optimization concerns.

Provided that's a correct assumption… It occurs to me that since thumbnail generation from PDF files requires a trip through Ghostscript (10-20s execution time regardless of size), and that even for DjVu the extraction from the multipage format is likely to be the most time-consuming step, this may be a poor tradeoff. Even the largest scans are not much bigger than your typical average photo on Commons. In other words, requesting the full size of the page should not take significantly longer than a scaled down version, and the result should almost always not consume a disproportionate amount of client resources.

That would give the zoom function some actually higher resolution data to work with, and would resolve a few pathological cases where the downscaling produces unreadable results (typically due to way too aggressive compression settings in the original).

The new OSD viewer includes a tile layer at 2x resolution image (2048px).

Further fidelity is probably limited due to MW only rendering PDFs at 150dpi, regardless of the actual image DPI, but that's tracked at T224355.

Does that resolve this issue?