w:DJVU files include a text layer. Typically a DjVu file begins with a text layer that consists of w:OCR text, which Wikisource uses as the initial version of the transcription. Wikisource contributors then 'fix' the OCR errors and save the corrections onto the Wikisource project as wikitext, and eventually the transcription is accurate & completed. A tool is needed to create a new DjVu file with the accurate & complete Wikisource transcription.
There are existing tools being worked on that extract the accurate & complete Wikisource transcription, typically exporting it as EPUB. However they likely discard a lot of useful information that is needed to recreate a DJVU file, most importantly the (x,y) positions of each piece of text. They may also discard the page numbers.
Tools exist which work with the w:hOCR data, for instance hOCR.js by Alex brollo (the gadget author who worked most with the DjVu layers), and djvutext.py.
Skills: Good knowledge of the DjVu file type desirable, and EPUB.
Mentors: John Vandenberg, ?.