As a product manager, I want the team to investigate the possibility of adding Tesseract to Wikimedia OCR, so that we could potentially have one more robust tool to improve & maintain (rather than 2 available via Preferences).
Resources:
Acceptance Criteria:
- Investigate how we can add Tesseract to Wikimedia OCR (from a technical perspective)
- Investigate the main technical challenges or risks with adding Tesseract to Wikimedia OCR
- Investigate if we could create some sort of configuration where Wikisource communities could decide which OCR engine (Tesseract or Cloud Vision API) would be their default. So, for example, if I was on Polish Wikisource, the community would determine which of the two engines would be the default OCR engine when I click "OCR" (for Wikimedia OCR)
- Share any additional ideas, concerns, or critical points to raise if we would proceed with such work
- Look into phetools to check how they interact with tessearct