Page MenuHomePhabricator

Export a collection of pages as a single document (PDF, HTML, printable) *client-side*
Open, Needs TriagePublic

Description

(This is probably a dup or quasi-dup, but this version is focused on a particular use case.)

One drawback of splitting a source document into multiple pages on wikisource (see T275319#9818815) is that it disrupts export, whether that's printing a split document or saving as a PDF or even (!) trying to load the entire thing as HTML into a single browser context to allow Ctrl-F search or whatever.

The old Collection extension used to provide a means of doing this, but it was poorly maintained, tried to do too many things, and its server-side rendering model didn't fit well within the jobs framework of MediaWiki/the WMF server cluster.

Related to the T365806 means of marking pages that should be chained together, it would be useful to have a means (gadget, special page, ...) to force loading all of the chained documents at once into the current browser window. Assuming that the user's browser had enough memory & etc, they could then print that page (aka "render to HTML"), save it, run Ctrl-F, etc. This is like a "force load" option for the T365806 infinite scroll feature.

As an extra bonus, perhaps this can be configured for a specific page range, so in the case of truly massive documents they could still be exported in 100 page chunks or whatever, which although imperfect is still 100x better than having to do it page-by-page.

The key change from earlier versions of this is that it is *client side*, which avoids the DoS and job scheduling issues. It also has some nice synergies with whatever markup mechanism (parser function? etc) used to drive T365806 infinite scroll, and doesn't require a separate heavyweight mechanism to name all the pages in the "book" the way that the Collection extension did.

Event Timeline

Doesn't the Wikisource extension already account for this ?

Stitching together HTML snippets could be done either on the server side or the client side, it's pretty harmless performance-wise. What's infeasible server-side (at least with a simplistic remote-controlled-browser approach) is rendering large PDFs.

@Tgr, as mentioned above the Wikisource extension appears to already doing this server-side on a separate WMCS cluster. Also, the Collection extension was removed from Wikisource a while ago (see T358437).