Page MenuHomePhabricator

Wikisource Exports: Show useful error if more than N subpages are found
Closed, ResolvedPublic

Description

As a Wikisource user, I would like errors associated with large subpage counts to fail gracefully (rather than a white screen), so I clearly understand that an error has occurred and why it has occurred.

Background: At the moment, if you request an export of a work that has a large number of subpages (or pages linked from the ToC), then you'll likely get a white screen and no exported file. For example, https://it.wikisource.org/wiki/Pensieri_di_varia_filosofia_e_di_bella_letteratura has 4,586 subpages and so should not be exported via the web interface. (I think? I mean, what's someone going to do with that epub?)

It should be possible to keep count of how many subpages we're traversing, and fail nicely after we get to some arbitrarily large number. Maybe 50 or 100 or something? I'll have a look at the distribution of subpage counts on various Wikisources. Or could we count the accumulated text size as we add subpages, and bail after some point? It seems that counting subpages would be incorrect for something like a poetical work where each subpage is tiny.

Acceptance Criteria:

  • Determine large number of subpages traversed (that would indicate a likely failure) with team
  • Implement a failure message in such cases [TBD]

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The number of subpages is important because each page requires its own request. I'd say ideally we should restrict both.

@Samwilson So, this sounds like a plan for a more graceful fail. Which is good.

But just to be clear: does this mean we would never be able to download a book with a lot of chapters? (And pages—4500!) I.e., do we have any ideas for actually solving the problem?

That's a good question. The current workaround is for people to download wsexport and run it themselves locally. I've managed to produce an epub for the above work, for instance; it took quite a while and is 9300 pages.

The better fix for this would be for large items to be added to a job queue that's processed separately to the web tool. I'm not sure it's worth working on going straight to that architecture without first establishing how many files are 'too large' (although, maybe it's going to be hard to figure out what the size threshold is).

I think the first thing is to get some data about numbers of subpages in works on various Wikisources. Then we can see how many works will be unable to be exported; it might not be that many.

I think 500 subpages might be a good point to start at. Here is a summary of the numbers of works with numbers of subpages, and this lists page titles and their subpage counts (for both queries, I've experimented with a few different language Wikisources).

500 would currently exclude 47/443541 works on English Wikisource, 23/485826 on Russian, and 67/7562 on Bengali, for example.

Niharika subscribed.

We decided (in the estimation meeting) to leave this task as it is for now. If we see this being a thing users run into frequently, we should do something about it.

ifried renamed this task from Show useful error if more than N subpages are found to Wikisource Exports: Show useful error if more than N subpages are found.Oct 29 2020, 10:40 PM
ifried updated the task description. (Show Details)
ifried updated the task description. (Show Details)

When I tested the link that previously had an error, I sometimes was able to download and I sometimes received a more helpful error message (see screenshot examples below). For this reason, I'm marking this work as resolved.

Screen Shot 2021-05-03 at 6.31.18 PM.png (1×2 px, 388 KB)

Screen Shot 2021-05-03 at 6.26.11 PM.png (1×2 px, 240 KB)

ifried claimed this task.