Page MenuHomePhabricator

ProofreadPage: Automatically link to other pages for multipage PDFs and DjVus
Closed, ResolvedPublic

Description

Patch to automatically link to other pages of multi-page documents

Currently, the ProofreadPage extension doesn't link to other pages of a multipage PDF or DjVu document.
Attached patch changes this, so that when no index for the file is found, links to the other pages are created.

I can commit this myself, but I'd like to get some feedback first if this approach is OK.

BTW: This patch is live at e.g. http://spiele.j-crew.de/wiki/Scan:9s_schdM.pdf


Version: unspecified
Severity: enhancement

Attached:

Details

Reference
bz12238

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:57 PM
bzimport added a project: ProofreadPage.
bzimport set Reference to bz12238.
bzimport added a subscriber: Unknown Object (MLST).

thomasV1 wrote:

this is interesting... but I have a few questions:

  1. currently on wikisource, users sometimes use

a page ordering in the index that differs from the
page ordering of the document. they do this in order
to have page numbers match in a book and in the
associated djvu. it seems to me that possibility
would remain, because your patch first checks for
the existence of an index... but I want to be sure
that it won't break anything.

  1. the index page has another function than linking

(not yet active on wikisource) : it displays the state
of advancement of pages using css attributes (quality).
see for example http://www.xarax.eu/wiki/Index:Das_vollkommenste_Hautskelet?action=purge

Now, if there is no manually created index page, I guess
this list of colored links should be generated automatically
(on a special page that can be included in pages, or invoking
a parser command).

About question one:
Yeah, the patch only changes behaviour in case no index page can be found.
As soon as one is created, everything works as before.

About question two:
Currently the code doesn't link to any index page, which is OK for me, but
probably not for wikisource. I think creating a special page that displays
the needed information and linking to this page if no index page can be
found would be the best solution.

thomasV1 wrote:

hmm, instead of creating a special page, maybe the Image: page could display
that information, and replace the index page.

Interesting idea, but I'm not quite sure where you'd hook the ImagePage class. The only hook available (AFAICS) is ImageOpenShowImageInlineBefore, which doesn't seem suitable, so you'd probably have to add a new one.

On second thought, you could insert some text into the image description, using the normal parser hooks.

thomasV1 wrote:

besides that, there is a bug in yout patch (I am testing it now) :
if the page namespace is not created, it prepends a double 'Page:'
prefix in the links

thomasV1 wrote:

patch

ok, here is a new patch, that creates the page list in a canonical index page.
I have not tested it toroughly, so I will not commit it now. Comments welcome.

Attached:

WorksForME :) (ie I have just tested it, although not very thoroughly, either)

The only thing I liked more about my patch is that the link to the first page didn't have the /1 tagged to it (so the link to the first page was Scan:document.pdf, not Scan:document.pdf/1).
I think this more consistent with single-page documents, but others would surely argue otherwise, so that's your decision.

OTOH, if you leave it like it currently is, the main page (ie Scan:document.pdf) could display the overview, like the index, and Scan:document.pdf/1 could display the first page.

Just some ideas...

thomasV1 wrote:

ok, commited.
I do not know what to do about the /1, will decide later

thomasV1 wrote:

I guess it is better to keep /1 for the first page, because it
will make life easier for bot programmers, and because we already
have a lot of documents using this convention.
so, I'm marking this bug as fixed. thanks for the patch.