Wikisource ebooks: Investigate using subpages from all pages, not just those with ws-summary
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Samwilson
	May 21 2020, 5:16 AM

Description

As a Wikisource user, I want the ability to use subpages from all pages (rather than just those with ws-summary), so 1) that ebook exports overall can more fully and accurately reflect the material that I want to download, and 2) the Table of Contents fully represents what I see when I look at the book online.

Background: At the moment, WSExport traverses pages from the user-given starting page by looking for all links to subpages, and also all links to subpages on any subpage if the link is contained within a .ws-summary element. This means that unless subpage ToCs are specifically marked as such, their subpages don't get included in the exported ebook.

I think it makes sense (although, I haven't fully looked into it) to just follow all subpage links, recursively and depth-first. This would make it easier for editors, who wouldn't have to do anything special, and readers, who would be more likely to get the whole work. There might be problems with processing works with a great many subpages, such as encyclopedias (e.g. 1911 Encyclopædia Britannica has 36,305 subpages, and using WSExport on its top-level page at the moment does work but doesn't give a very useful ebook). I wonder if for those we'd be better off saying "this is too big" and making it obvious that although they asked for it we can't actually give the reader the entire work as an ebook.

Acceptance Criteria:

Investigate the primary work that would need to be done in order to use subpages from all pages, not just those with ws-summary, in WSExport
Investigate the main challenges and risks associated with such work
Provide a general estimate/idea, if possible, of the potential impact it may have on ebook export reliability
Provide a general estimation/rough sense of the level of difficulty of effort required in doing such work

Related Objects

Mentioned In: T357176: Wikisource: Can't build full work as Ebook
T275870: Ws export: failed to get subpages for work with colon in the name
T259235: Export to EPUB doesn't follow links from {Dotted TOC page} and {TOC row}
T244099: Spike: Investigate "Improve export of electronic books" [8 hours]
Mentioned Here: T253283: Wikisource Ebooks: Investigate job queue for more efficient ebook generation [16H]
T258961: Not All ToC items exported

Event Timeline

Samwilson created this task.May 21 2020, 5:16 AM

Restricted Application added a project: Community-Tech. · View Herald TranscriptMay 21 2020, 5:16 AM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Samwilson mentioned this in T244099: Spike: Investigate "Improve export of electronic books" [8 hours].May 21 2020, 5:16 AM

ifried moved this task from New & TBD Tickets to Needs Discussion on the Community-Tech board.Jun 11 2020, 4:29 PM

ifried renamed this task from Use subpages from all pages, not just those with ws-summary to Wikisource ebooks: Investigate using subpages from all pages, not just those with ws-summary.Jun 11 2020, 10:57 PM

ifried added a project: All-and-every-Wikisource.

ifried updated the task description. (Show Details)

ifried updated the task description. (Show Details)Jun 25 2020, 8:58 PM

We talked about this in estimation today, and we concluded that this is not ready to estimate. We need to first determine if having ws-summary is useful enough to warrant possibly introducing a new issue for big books since we would need to traverse all subpages, which could have performance implications or we would need to create cap (which may introduce new issues).

This is pending, based on further analysis and discussion as a team.

A new issue pointed out that works with {{AuxTOC}} on English Wikisource were not exporting correctly. Adding ws-summary to this template seems to have fixed the issue.

There might be other ToC templates that could benefit from this, and adding ws-summary to them is probably easier than modifying wsexport to traverse all pages. (AuxTOC is on 10 Wikisources.)

T258961 is about the same issue, and suggests that there might be situations in which we are not able to determine the full list of pages to include, because they're not always all subpages. The fix for that is ws-summary, but that's not at all obvious to someone trying to figure out why only some of their ToC pages are being exported.

Perhaps after we move to a job queue (T253283) it can have a warning message output to the user when there's a difference between the number of mainspace links in a page and the number of pages exported?

But yes, in general it looks like this task might be invalid.

Samwilson mentioned this in T259235: Export to EPUB doesn't follow links from {Dotted TOC page} and {TOC row}.Jul 30 2020, 11:31 AM

ifried updated the task description. (Show Details)Sep 10 2020, 3:15 PM

dom_walden mentioned this in T275870: Ws export: failed to get subpages for work with colon in the name.Mar 3 2021, 3:10 PM

ifried moved this task from Needs Discussion to Older: Team Work on the Community-Tech board.May 3 2021, 9:24 PM

Samwilson moved this task from Backlog to Ready to work on on the WS Export board.May 4 2021, 12:31 AM

Droftnats mentioned this in T357176: Wikisource: Can't build full work as Ebook .Feb 9 2024, 8:46 PM