Page MenuHomePhabricator

Wikisource ebooks: Investigate using subpages from all pages, not just those with ws-summary
Open, Needs TriagePublic

Description

As a Wikisource user, I want the ability to use subpages from all pages (rather than just those with ws-summary), so 1) that ebook exports overall can more fully and accurately reflect the material that I want to download, and 2) the Table of Contents fully represents what I see when I look at the book online.

Background: At the moment, WSExport traverses pages from the user-given starting page by looking for all links to subpages, and also all links to subpages on any subpage if the link is contained within a .ws-summary element. This means that unless subpage ToCs are specifically marked as such, their subpages don't get included in the exported ebook.

I think it makes sense (although, I haven't fully looked into it) to just follow all subpage links, recursively and depth-first. This would make it easier for editors, who wouldn't have to do anything special, and readers, who would be more likely to get the whole work. There might be problems with processing works with a great many subpages, such as encyclopedias (e.g. 1911 Encyclopædia Britannica has 36,305 subpages, and using WSExport on its top-level page at the moment does work but doesn't give a very useful ebook). I wonder if for those we'd be better off saying "this is too big" and making it obvious that although they asked for it we can't actually give the reader the entire work as an ebook.

Acceptance Criteria:

  • Investigate the primary work that would need to be done in order to use subpages from all pages, not just those with ws-summary, in WSExport
  • Investigate the main challenges and risks associated with such work
  • Provide a general estimate/idea, if possible, of the potential impact it may have on ebook export reliability
  • Provide a general estimation/rough sense of the level of difficulty of effort required in doing such work

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ifried renamed this task from Use subpages from all pages, not just those with ws-summary to Wikisource ebooks: Investigate using subpages from all pages, not just those with ws-summary.Jun 11 2020, 10:57 PM
ifried updated the task description. (Show Details)

We talked about this in estimation today, and we concluded that this is not ready to estimate. We need to first determine if having ws-summary is useful enough to warrant possibly introducing a new issue for big books since we would need to traverse all subpages, which could have performance implications or we would need to create cap (which may introduce new issues).

This is pending, based on further analysis and discussion as a team.

A new issue pointed out that works with {{AuxTOC}} on English Wikisource were not exporting correctly. Adding ws-summary to this template seems to have fixed the issue.

There might be other ToC templates that could benefit from this, and adding ws-summary to them is probably easier than modifying wsexport to traverse all pages. (AuxTOC is on 10 Wikisources.)

T258961 is about the same issue, and suggests that there might be situations in which we are not able to determine the full list of pages to include, because they're not always all subpages. The fix for that is ws-summary, but that's not at all obvious to someone trying to figure out why only some of their ToC pages are being exported.

Perhaps after we move to a job queue (T253283) it can have a warning message output to the user when there's a difference between the number of mainspace links in a page and the number of pages exported?

But yes, in general it looks like this task might be invalid.