Page MenuHomePhabricator

Wikisource: footnote links do not work for large books
Open, Needs TriagePublic8 Estimated Story Points

Description

Steps to reproduce:

  1. Download epub of https://fr.wikisource.org/wiki/La_France_juive/Texte_entier/Tome_second
  2. Go to page 17 and click on footnote #4
  3. Notice that nothing happens/you are not redirected to relevant entry in footnotes section

Acceptance Criteria:

  • Read comments below to understand potential root cause of issue
  • Implement a fix so that footnote links work for books with a large amount of pages

Event Timeline

@ifried There are 6 works in that series that have page 62. Can you please drill down a little further at least on the wiki so we can see how the same links are working in html. Thanks.

I'm copy-pasting/translating some remarks from @Denis_Gagne52 (who in not familiar with Phabricator) on the fr.ws scriptorium.

Investgate how widespread this issue is

Answer: This problem will occur whenever the size of a chapter exceeds 250 kb and ws-export, through its splitchapter function, tries to split it. (See BookCleanerEpub.php). This function is currently inoperative at least for any chapter from French Wikisource. It only divides the chapter into two files, one containing all the text of the chapter and the other, the footnotes grouped together at the end of the chapter.
As a result, the desired result is not achieved since the size of the chapter remains approximately the same and it is no longer possible to access the notes which are thus moved to a different container, the links being automatically broken.

Investigate possible solution(s):

Easy answer: simply eliminate splitchapters from ws-esport because it is responsible for that big thorn in the heel of our footnotes. In addition, this treatment is no longer necessary in Calibre's current state.
More technical answer: the --flow-size parameter of ebook-convert allows "Split all HTML files larger than this size (in KB)." This is necessary because most EPUB players do not support large file sizes. By default 260 KB is the size required by Adobe Digital Editions. Set to 0 to disable division based on size." (Calibre Help page)
Example of using convert in command line mode: "%programfiles%\Calibre2\ebook-convert" Temp.epub Final.epub
This example made it possible to split La France juive/Texte entier/Tome second into 8 parts of size less than 260 KB without breaking the links with the 257 notes found inside this "faje chapter" which has almost 600 pages. Calibre handles this situation very well.

Recommendation: eliminate splitchapters and use Calibre instead when the size of a chapter exceeds 260Kb. In this case, convert epub to epub at the very end of the process.

Note: Ws-export produces an epub file and asks Calibre to convert it to the format chosen by the user. To do this, Calibre must first go through an interchange or transition format which is XHTML. If Ws-export produced an XHTML file from the start, the processing would be lightened not to mention that the XHTML is closer to what we find on Ws. Perhaps this should be analyzed during a possible redesign. A note to this effect could be placed in GitHub.

My 2 cents: I like the idea of using Calibre ePub to ePub to do the file split in case of too big files. The current implementation in Wsexport is very bad. An other option would be to fix it inside of Wsexport by having a look of what Calibre is actually doing internally.

About using XHML for input in Calibre, I don't know how to properly provide Calibre the eBook structure and metadata in HTML. Calibre seems also to recommand using ePub instead of a plain HTML file.

ifried renamed this task from Wikisource: footnote links do not work to Wikisource: footnote links do not work for large books.Jan 21 2021, 11:53 PM
ifried updated the task description. (Show Details)
Samwilson set the point value for this task to 8.Feb 2 2021, 1:31 AM
Samwilson subscribed.

In today's engineering discussion about this we decided that we'd first remove the chapter-splitting and run some tests about how many books contain extra-large pages. We think that the issues with large epub chapters might be rare, and also that wiki pages that are too large also can cause other issues for people (for example, Tome second from above is nearly 500 KB in a browser, and there are natural places where it could be split into more subpages). There are always books that we're not going to be able to export successfully (c.f. T222690), and things that editors can do to improve the exports.

Removing the chapter splitting should fix this bug

If it turns out that there are lots of validly huge pages, then we'll add epub-to-epub conversion with Calibre, to offload the chapter splitting to software that (presumably) will do it well.

Does that sound okay?

Sam,
I choosed that large pseudo-book only to show that the limitation was not coming from Calibre.
The proposition sounds okay for me.