Page MenuHomePhabricator

Wikisource OCR: Investigate adding Tesseract to Wikimedia OCR [16H]
Closed, DeclinedPublic

Description

As a product manager, I want the team to investigate the possibility of adding Tesseract to Wikimedia OCR, so that we could potentially have one more robust tool to improve & maintain (rather than 2 available via Preferences).

Resources:

Acceptance Criteria:

  • Investigate how we can add Tesseract to Wikimedia OCR (from a technical perspective)
  • Investigate the main technical challenges or risks with adding Tesseract to Wikimedia OCR
  • Investigate if we could create some sort of configuration where Wikisource communities could decide which OCR engine (Tesseract or Cloud Vision API) would be their default. So, for example, if I was on Polish Wikisource, the community would determine which of the two engines would be the default OCR engine when I click "OCR" (for Wikimedia OCR)
  • Share any additional ideas, concerns, or critical points to raise if we would proceed with such work
  • Look into phetools to check how they interact with tessearct

Event Timeline

ARamirez_WMF renamed this task from Wikisource OCR: Investigate adding Tesseract to Wikimedia OCR to Wikisource OCR: Investigate adding Tesseract to Wikimedia OCR [16H].Apr 1 2021, 11:31 PM

We have decided to proceed with the work in T279118, so I'm closing this ticket as Declined/No longer necessary.