Page MenuHomePhabricator

Offer PDF export of entire wikisource proofread books
Open, Needs TriagePublic

Description

Dealing with proofread wikisource book, I think that Index pages could be exported too with a specific tool, saving original pagination of the edition (t.i. converting individual nsPage pages into individual pdf pages).

The result could be, more or less, the same that can be viewed by it.source "viewer":
Don Chisciotte

I presume that conversion could be much simpler than the conversion of ns0 pages, and that it turned out as a much more faithful digitalization of source edition, while ns0 transcription is something like a "new edition".

Related Objects

StatusAssignedTask
OpenNone
OpenNone
OpenNone
DeclinedNone
ResolvedNone
ResolvedNone
ResolvedNone
DeclinedNone
DeclinedNone
ResolvedLegoktm
ResolvedNone
ResolvedNone
ResolvedNone
DeclinedNone
DeclinedNone
ResolvedNone
DeclinedNone
ResolvedLegoktm
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
DeclinedNone
DeclinedNone
DeclinedNone
ResolvedNone
InvalidNone
DeclinedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
DeclinedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
ResolvedNone
ResolvedNone
ResolvedNone

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 5 2017, 3:32 PM

@Alex_brollo: What is the current situation after which steps performed, and what is the expectation? Following https://mediawiki.org/wiki/How_to_report_a_bug is welcome, and editing the task summary to summarize either a bug or a requested feature ("PDF export of wikisource proofread books" is neither, as the "Scarica come PDF" option exists on the page that you linked to). Thanks a lot! :)

What I suggest is, to export all the nsPage pages linked with nsIndex page, saving original pagination and using Index page as "an index" only, t.i. to build a PDF of the whole book.

Presently, the option "Scarica come PDF" downloads only a pdf version of Index page.

Aklapper renamed this task from PDF export of wikisource proofread books to Offer PDF export of entire wikisource proofread books.Nov 6 2017, 10:51 AM
Alex_brollo added a comment.EditedNov 7 2017, 8:35 AM

In the meantime, I'll try a do.it.yourself approach exploring, then using wkhtmltopdf by python, just to get a "it can be done" first result.

  • page for page:
    • download html
    • edit it removing/editing what needs to be removed/edited
    • convert into a pdf page
    • append to global pdf

it.source nsIndex pages have a standardized Summary field, that can be used to get a pdf index page.

Suggestions about other command-line html to pdf conversion applications, running under windows 10, are welcome!

Alex_brollo updated the task description. (Show Details)Nov 7 2017, 9:05 AM
Billinghurst added a subscriber: Billinghurst.EditedNov 8 2017, 9:48 PM

This isn't clear to me what you are trying to achieve, and how it would be different from downloading a PDF from the main namespace

Are you talking about proactively generating PDFs of all "proofread once" works and storing them?
or
Are you talking about the ability to create PDFs from the index: ns page of all "proofread once" (minm.) works?
or
Are you talking about putting corrected text page back along-side/overlaying images in a PDF format?

This isn't clear to me what you are trying to achieve, and how it would be different from downloading a PDF from the main namespace

I'm simply suggesting to build an alternative pdf exporting tool, similar to the running one, but mirroring original pagination of source edition. Often I feel that page structure/page design/page balance of text/illustrations should be saved as far as possible into exported pdf just ad they are saved into nsPage html.