Page MenuHomePhabricator

Wikisource Ebooks: Investigate cover page bug [8h]
Closed, ResolvedPublic

Description

As a French Wikisource user, I want the cover page bug to be resolved, so that I can see the cover page properly displayed in all file formats.

Background: French Wikisource exports have a unique feature -- they display an image of the book's cover on the first page (see screenshot example below). This allows users to have an enriched experience for users. However, there is a bug -- the first page is blank (rather than an image of the book's cover). Users have reported this issue occurring for PDF downloads. Overall, the purpose of this ticket is to try to fix the issue, so that all major file formats provided to users -- PDF, EPUB, and MOBI, at the very least -- display the cover image when a book is being exported.

Acceptance Criteria:

  • Investigate issue of cover page images not appearing in PDF exports
  • If possible, propose a solution or implement the solution

Visual Examples when downloading Une Visite à Bedlam from French Wikisource:

Cover image displayed in MOBI export of book

Screen Shot 2021-01-18 at 5.01.53 PM.png (618×477 px, 96 KB)

First few pages of PDF: Cover page blank in PDF version of same book

Screen Shot 2021-01-18 at 5.31.52 PM.png (633×180 px, 27 KB)

Event Timeline

@dom_walden I was unable to reproduce the issue described, but users report that it's common. Have you seen this issue or have any tips on how to reproduce it?

ifried renamed this task from Ebook Exports: Investigate PDF to Ebook Exports: Investigate PDF cover page bug.Jun 10 2020, 8:44 PM
ifried renamed this task from Ebook Exports: Investigate PDF cover page bug to Wikisource Ebooks: Investigate PDF cover page bug.Jun 11 2020, 10:59 PM
ifried updated the task description. (Show Details)
ifried renamed this task from Wikisource Ebooks: Investigate PDF cover page bug to Wikisource Ebooks: Investigate cover page bug.Jan 19 2021, 6:27 PM

Hello ifried

I copy here information from French Wikisource user Denis Gagne52 that may help, I hope.

It is easy to work around this bug. Just replace the cover page with an image that is not encoded in grayscale on 8bits. I had done this by simply changing the f-s on commons to include a color cover.

(My comment: this is difficult for the average Wikisource contributor to do and we would have to go back to too many books.)

If it worked before, Adobe supported 8-bit images even if they were labeled as 24-bit or 32-bit color images. Adobe stopped doing this not too long ago. This is a voluntary gesture on their part. As a result thousands of documents produced in the past using Office, AutoCad, Caliber, etc. contain images that will no longer be displayed. Adobe argues that it does not have to be backward compatible because these PDF files were not produced by an Adobe solution and the images are mislabeled.

I think Caliber's faulty code is here [https://github.com/kovidgoyal/calibre/blob/e91ebda5e862ea7e6ae60dfda5fbf74d6e8b5b7a/src/calibre/ebooks/pdf/render/serialize.py#L262]

at lines 436, 437, 438. Caliber assumes the 8bit image is being converted to 32bit without being sure that the transformation fails.
The correction can be done from Ws-export by converting in advance the images in gray tone to make them 24bits images supported by Caliber
This PHP code works for me:

<code>
<?php

// Fichiers  source et destination
$originalFileName    = 'D:\Outils\Script\Scribe_08bits.jpg';
$destinationFileName = 'D:\Outils\Script\Scribe_24bits.jpg';

$info= getimagesize($originalFileName);
if ($info[2]==IMG_JPEG) {
if (($info['bits']==8) && ($info['channels']==1)) {
// création de l’image en mémoire
$sourceImage = imagecreatefromjpeg($originalFileName);
// lecture de ses dimensions
$img_width  = imageSX($sourceImage);
$img_height = imageSY($sourceImage);
// création d’une image couleur de mêmes dimensions
$destinationImage = ImageCreateTrueColor($img_width, $img_height);
// copie de l’image dans on nouveau contenant
imagecopy($destinationImage, $sourceImage, 0, 0, 0, 0, $img_width, $img_height);
// sauvegarde de la nouvelle image sur disque
imagejpeg($destinationImage, $destinationFileName);
// destroy temp image buffers
imagedestroy($destinationImage);    
imagedestroy($sourceImage);
}
}

?>
</code>

The resulting image Scribe_24bits.jpg will display well if exported as pdf from ws-export.

Here is an example of the error message when we open a pdf file with Adobe Acrobat Reader.

Free translation : Insufficient data for an image.

The rest of the book is normal.

Screen from Wikisource pdf exported file.PNG (1×1 px, 73 KB)

ARamirez_WMF renamed this task from Wikisource Ebooks: Investigate cover page bug to Wikisource Ebooks: Investigate cover page bug [8h].Jan 22 2021, 12:32 AM
ARamirez_WMF moved this task from Needs Discussion to Up Next (May 6-17) on the Community-Tech board.

Thank you so much, @Viticulum! This information provided helpful context surrounding both workarounds (for experienced editors) and the challenges of such workarounds (for less experienced editors). The background on Adobe no longer providing support for 8-bit images was helpful as well. Much appreciated!

The team is currently focusing on fixing bugs directly related to our reliability improvements work. I do not yet know if we will be able to fix this issue, but I'll update this ticket if we can. One final question: In your opinion, what is the overall priority or importance of this issue to French Wikisource users? Thank you in advance!

Hello @ifried . There has been many comments on this issue since it started in our Scriptorium. I believe the priority to be quite high.
Also the visitors to Wikisource won't understand why there is an error message when they open the pdf book, and may think the book is incomplete.
Thank you for all the attention you pay to our problems.

This bug seems to have been fixed upstream in ImageMagick https://github.com/ImageMagick/ImageMagick/issues/2070
We should upgrade it in WSexport servers

Thank you to @Denis_Gagne52 for the notification.

This bug affects images issued by Commons. An upgrade of Calibre on the ws-export server will fix the cover page bug whith grayscale images. Can we expect this upgrade in a near future ??

@Denis_Gagne52 The ImageMagick package on Wikimedia servers is the Debian one. So an "upgrade" requires updated Debian packages which include the fix...

@Aklapper The same update is needed on Ubuntu, Windows... Isn’t it ? As soon as I got the information 4 months ago, I upgraded to Calibre to 6.7.1 and tested with images downloaded from Commons. I informed @Tpt that a fix was available and that it solved the cover page bug. Now I know that someone at WMF is notified : files downloaded from Commons/ws-export can be corrupted, and a fix was released by ImageMagick.

@Aklapper The same update is needed on Ubuntu, Windows...

Yes, and that's not relevant here, as Wikimedia servers run on Debian and thus require Debian packages.

Tested on 3 different books with 8bits grayscale cover and the image is displayed in the pdf files exported
Merci beaucoup @Samwilson !

Samwilson claimed this task.

Oh great! This is thanks to the upgrade (T332450).