As a product manager, I want to know the options available for adding bulk OCR to Wikimedia OCR, so that we can potentially implement a massive upgrade to Wikimedia OCR that can benefit many Wikisource users.
- Investigate the general work required to add bulk OCR capabilities to Wikimedia OCR, so that if a user is using the OCR tool they can click an OCR button (potentially from the Index page) to OCR the whole book
- Provide a general proposal for how this can be accomplished from a technical perspective
- Provide description of risks and challenges of this work
- Provide a proposal for how these situations can be handled:
- If part of the book has already been OCR-ed page-by-page, what would be the possible options that the product manager could choose as recommended behavior (such as override all of the previous page-by-page OCRs, keep them and mix with newly OCR-ed pages, etc)
- If pages have OCR-ed or manually typed, and they have also been proofread or validated, what options would be available for the behavior (such as keeping text, overriding text, etc)?