Wikisource OCR: Investigate adding Tesseract to Wikimedia OCR [16H]
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	ifried
	Mar 31 2021, 6:45 PM

Description

As a product manager, I want the team to investigate the possibility of adding Tesseract to Wikimedia OCR, so that we could potentially have one more robust tool to improve & maintain (rather than 2 available via Preferences).

Resources:

tesseract on github

Acceptance Criteria:

Investigate how we can add Tesseract to Wikimedia OCR (from a technical perspective)
Investigate the main technical challenges or risks with adding Tesseract to Wikimedia OCR
Investigate if we could create some sort of configuration where Wikisource communities could decide which OCR engine (Tesseract or Cloud Vision API) would be their default. So, for example, if I was on Polish Wikisource, the community would determine which of the two engines would be the default OCR engine when I click "OCR" (for Wikimedia OCR)
Share any additional ideas, concerns, or critical points to raise if we would proceed with such work
Look into phetools to check how they interact with tessearct

Related Objects

Mentioned Here: T279118: Wikisource OCR: add support for tesseract on wikimedia ocr

Event Timeline

ifried created this task.Mar 31 2021, 6:45 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 31 2021, 6:45 PM

ifried updated the task description. (Show Details)Mar 31 2021, 6:46 PM

ifried updated the task description. (Show Details)Apr 1 2021, 8:46 PM

ifried updated the task description. (Show Details)Apr 1 2021, 8:50 PM

dmaza updated the task description. (Show Details)Apr 1 2021, 9:07 PM

dmaza updated the task description. (Show Details)Apr 1 2021, 9:13 PM

ARamirez_WMF renamed this task from Wikisource OCR: Investigate adding Tesseract to Wikimedia OCR to Wikisource OCR: Investigate adding Tesseract to Wikimedia OCR [16H].Apr 1 2021, 11:31 PM

ldelench_wmf moved this task from Needs Discussion to Up Next (June 3-21) on the Community-Tech board.Apr 1 2021, 11:33 PM

ldelench_wmf moved this task from Up Next (June 3-21) to Needs Discussion on the Community-Tech board.

We have decided to proceed with the work in T279118, so I'm closing this ticket as Declined/No longer necessary.

Wikisource OCR: Investigate adding Tesseract to Wikimedia OCR [16H]Closed, DeclinedPublicActions

Description

Related Objects

Event Timeline

Wikisource OCR: Investigate adding Tesseract to Wikimedia OCR [16H]
Closed, DeclinedPublic
Actions