In the Wikisource Page namespace, the quality of a PDF book scan uploaded on Wikimedia Commons is artificially worsening.
Examples:
Description
Related Objects
- Mentioned In
- T257025: Provide a way of serving high quality scans on a per-page basis at Wikisource (such as those hosted at external source)
T256848: Images from PDF displayed with degraded quality ( when background sized smaller than mask and other layers?). - Mentioned Here
- T184867: Unexpectedly low scan resolution in Page namespace for some DjVu books
Event Timeline
Are they worse on the original (in which case, there is almost nothing we can do about this)? Or just in the thumbnails being displayed?
It is maybe related to T184867. Has the first page of the PDF a smaller size orresolution than the other ones?
Yes.
This PDF: the first page 347x524 (22,4 kB), others about 377x529 (42,3 kB)
This PDF: the first page 163x253 (2,49 kB), others about 166x255 (3,90 kB)
Title page usually has smaller size, because there's not so many text.
PS. It's not only lower resolution in ProofreadPage, text itself becomes more fuzzy than on commons.
High resolution thumbnails from the file, like:
https://upload.wikimedia.org/wikipedia/commons/thumb/f/f0/%D0%9F%D1%83%D1%88%D0%BA%D0%B8%D0%BD._%D0%95%D0%B2%D0%B3%D0%B5%D0%BD%D0%B8%D0%B9_%D0%9E%D0%BD%D0%B5%D0%B3%D0%B8%D0%BD_(1837).pdf/page7-1834px-%D0%9F%D1%83%D1%88%D0%BA%D0%B8%D0%BD._%D0%95%D0%B2%D0%B3%D0%B5%D0%BD%D0%B8%D0%B9_%D0%9E%D0%BD%D0%B5%D0%B3%D0%B8%D0%BD_(1837).pdf.jpg
look poor and exhibit artifacts likely resulting from scalling-up jpg image with lossy compression
While PBM images extracted from this PDF file using pdfimages program from the xpdf package being the same resolution:
$ pdfimages -f 7 -l 7 Пушкин._Евгений_Онегин_\(1837\).pdf x $ identify x-000.pbm x-000.pbm PBM 1834x2829 1834x2829+0+0 1-bit Bilevel Gray 651KB 0.010u 0:00.010
are much higher quality.
It seems that for some reason non-maximum quality images are extracted from the PDF by the software used here. It is unrelated to the first page quality.
The PDF thumbnailing works in two steps: First, Ghostscript extracts a page to JPG at the original resolution of the page. Then ImageMagick scales it to the requested size. The pages in this document vary in size around about 340x530 +/- 5 px, which causes some of the quality loss as images are upscaled. (ProofreadPage thinks every page in the file is 339 × 527).
The PDF says the page's 338x527px, so Ghostscript writes a JPEG at 338x527. That doesn't sound like a bug in the thumbnailer to me, it sounds like a problem with the PDF.