Page MenuHomePhabricator

Investigation: Could we build a Tool Labs project to generate Djvu files for WikiSource
Closed, ResolvedPublic3 Estimated Story Points

Description

Apparently, the Internet Archive is no longer generating Djvu files, so the folks on WikiSource don't have a web interface for generating them any more. We should investigate what would be involved in setting up a simple interface on Tool Labs for generating Djvu files from PDFs or a set of JPGs.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 330481 had a related patch set uploaded (by Pppery):
Canonicalize title before creating new newsletter

https://gerrit.wikimedia.org/r/330481

Pppery added a subscriber: Pppery.
This comment was removed by Pppery.

The IA Upload tool can be modified to do this, by incorporating the process that's defined in @Alex_brollo's jp2tojdvu.py.

See https://github.com/Tpt/ia-upload/issues/14

It uses a few external commands from djvuLibre, which is already available on Tool Labs.

kaldari edited projects, added Community-Tech-Sprint; removed Community-Tech.
kaldari moved this task from Ready to In Development on the Community-Tech-Sprint board.
kaldari set the point value for this task to 3.
Samwilson triaged this task as Medium priority.Feb 7 2017, 12:21 AM

An issue was identified when the JP2 files had different filenames than were expected (e.g. containing [^a-zA-Z0-9]) and this is now fixed. It seems that the page names in DjVu XML are somewhat constrained.

I think the patch is ready to merge. Will wait for some more testing though — https://tools.wmflabs.org/ia-upload/test/

@Samwilson: What did you find out about storing user OAuth credentials? Is this kosher or not? @bd808 might be a good person to ask about it.

I asked on labs-l and they suggest using a database instead, but that a 0600 json file is still okay. The file is created like this:

$oldUmask = umask( 0177 );
touch( $jobFile );
umask( $oldUmask );
chmod( $jobFile, 0600 );
file_put_contents( $jobFile, \GuzzleHttp\json_encode( $jobInfo ) );

Fair enough :)

So I guess the answer to the investigation question is... Yes!

I'll create a follow-up task for actually finishing the merge/roll-out/documentation.

Ha, yes, good point! :)

Thanks.

@Samwilson: What did you find out about storing user OAuth credentials? Is this kosher or not? @bd808 might be a good person to ask about it.

For clarity ... https://lists.wikimedia.org/pipermail/labs-l/2017-February/004883.html