Project title
Bulk OCR on Wikisource
Description of project
Wikisource is an online wiki-based digital library of free-content textual sources operated by the Wikimedia Foundation. The Bulk OCR feature aims to provide an easy way for volunteers to OCR multiple pages or, say, an entire book on Wikisource. However, the ability to perform bulk OCR on any work should be restricted only to certain groups of users. To this end, there is a need to add features to the Wikisource extension to allow authorized users to OCR multiple pages at once and insert the OCRed text back into the relevant text layer of the corresponding pages of the book on Wikisource.
Expected outcomes
By the end of the project, contributors will have written well documented code to enable a functional workflow that allows authorized users to perform bulk OCR of pages of a particular work on Wikisource. Updating the documentation of the new workflow on any existing documentation pages is also expected.
Preferred skills
Javascript, HTML, CSS, familiarity with object oriented programming, experience with PHP and Mediawiki are bonuses
Mentor(s)
Parthiv Menon (@theprotonade), Satdeep Gill (@SGill)
Size
350 hours
Difficulty
Medium
Microtasks
Additional information
- T359703
- Wikimedia OCR tool
- Wikimedia OCR source code
- Wikimedia OCR documentation
- Wikisource extension
- Existing user script for bulk OCR