Page MenuHomePhabricator

Add on-wiki UI for selecting languages
Closed, ResolvedPublic

Description

Both Google Cloud Vision API and Tesseract allow for specifying multiple languages when processing an image's text, to help make the OCR more accurate.

Only Tesseract provides a dynamic means of retrieving what languages are supported. For Google it's just a list on the above page.

Currently, we just use a Wikisource's content langauge as the language, but this is not optimal for pages with multiple languages nor for Multilingual Wikisource.

The language codes for the two engines differ, so we'll have to map them to some sort of common system.

Event Timeline

Although this work is done, and multiple language selection is available in the advanced options of the tool, I wonder if this should stay open for implementing an on-wiki UI for multiple language selection (I can't find another ticket for that).

We did have some discussion about how to do that UI, and I think it was something along the lines of a TagMultiselectWidget with each selection being made via a UniversalLanguageSelector popup (which is how we do multiple langauge selection in SVG Translate Tool, for example). The main question is: where should the widget go? It might look weird in the 'Transcribe text' dropdown menu.

Samwilson renamed this task from Add support for multiple languages to Add on-wiki UI for selecting multiple languages.Nov 17 2021, 6:05 AM

Remember that the languages are not necessarily the same list as "real" languages: they're just different models provided by the relevant tool, and there can by multiple models for one language (e.g. A special model for old school printing).

Importantly, the UI needs the list of available languages.

Remember that the languages are not necessarily the same list as "real" languages: they're just different models provided by the relevant tool, and there can by multiple models for one language (e.g. A special model for old school printing).

Importantly, the UI needs the list of available languages.

Good points! We can get the lists of languages from https://ocr.wmcloud.org/api/available_langs?engine=tesseract so maybe rather than ULS we just have more-or-less the same language chooser UI that is currently on the tool's form, i.e. just a MenuTagMultiselectWidget, with custom data.

We could turn the dropdown into a gear icon, and open a dialog with all the options. It'd have to contain the engine choice, and languages, and tesseract PSM, in order to make the 'advanced options' link redundant (once ODS is doing the cropping, that is).

I think a dialog makes much more sense, because there are a lot more options you might consider to add, for example OCR blacklist chars (T287080) as well as a pre-processing threshold step (T287125) and probably a small handful of other usefully-twiddlable controls. Stuffing it all into a menu is going to get messy.

Samwilson renamed this task from Add on-wiki UI for selecting multiple languages to Add on-wiki UI for selecting languages.Jan 21 2022, 12:26 AM
Samwilson added a subscriber: KLawal-WMF.

How are you going with this @Soda? I think there might be some overlap with T331961, so you and @KLawal-WMF might need to compare notes.

Yeah sure, @KLawal-WMF how far have you gotten ?

Yeah sure, @KLawal-WMF how far have you gotten ?

Transkribus has been added as an option in the OCR menu. Currently resolving comments by @Samwilson

Hi @Soda, how far have you gotten ? I am working adding more options for the engines

Change 971235 had a related patch set uploaded (by Kolakachi; author: Kolakachi):

[mediawiki/extensions/Wikisource@master] Add more options for ocr engines

https://gerrit.wikimedia.org/r/971235

Change 989530 had a related patch set uploaded (by Kolakachi; author: L10n-bot):

[mediawiki/extensions/Wikisource@master] Add on-wiki UI for selecting languages

https://gerrit.wikimedia.org/r/989530

Change 989530 abandoned by Kolakachi:

[mediawiki/extensions/Wikisource@master] Add on-wiki UI for selecting languages

Reason:

https://gerrit.wikimedia.org/r/989530

Change 971235 merged by jenkins-bot:

[mediawiki/extensions/Wikisource@master] Add on-wiki UI for selecting languages

https://gerrit.wikimedia.org/r/971235

KLawal-WMF claimed this task.