This card tracks a proposal that's currently part of the Community Wishlist Survey: https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey
The Wishlist Survey voting phase lasts until Dec 14th. After the voting has concluded, the top proposals will form the backlog for the Community Tech team to investigate and address.
**Proposal**:
Just as many other wikisource users I appreciate a lot Internet Archive digitalization service, and I use it as deeply as I can (djvu files being only one from many uses of the rich file set that can be downloaded: collection of high-resolution jp2 images and abbyy xml being really extremely interesting).
I'd like that mediawiki should implement a similar digitalizing environment, but with a wiki approach and a wikisource-oriented philosophy, to share the best possible applications to pre-OCR jobs of book page images (splitting, rotating, cropping, dewrapping... in brief, "scantailoring" images), saving excellent lossless images from pre-OCR work; then the best possible OCR should be done, with ABBYY OCR engine or similar software if any, saving both text and full-detail OCR xml; then excellent images and best possible OCR text should be used to produce excellent seachable pdf and djvu files; finally - and this step would be really "wiki" - embedded text should be fixed by usual user revision work done into wikisource.
This is a bold dream; a less bold idea is, to get full access to best, heavy IA files (jp2.zip and abbyy xml) and to build tools for use them as thoroughly as possible.
--Alex brollo (talk) 07:08, 11 November 2015 (UTC)
https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Wikisource#To_implement_a_Internet_Archive-like_digitalization_service