Page MenuHomePhabricator

Wikisource OCR: Selection of engine for OCR button experience
Open, HighPublic5 Estimated Story Points

Description

Acceptance Criteria:

  • The default OCR is Tesseract for all users.
  • IF the user wants to change the selected their default selection, the following behavior will occur:
    • The user can click "extract" and the OCR rendering will occur with their default OCR engine
    • The user can change the OCR engine by clicking on the down arrow icon
  • In the dropdown, display a final option entitled "Advanced options," which:
    • Links to the wmcloud OCR form (which would be useful for advanced users)
    • Ideally, pre-populate the link to the Commons image in the form page
  • Question from @SGill: Can wikis have a default engine that they have specified or have an engine that is recommended (which is best for the language, the wiki's needs)?

Idea: Maybe we give option to communities to select default, but all users can choose to switch to another one in the proofread page if they want (or in bulk, if/when that is available).

Visual Reference

Documentation of technical considerations:

  • Hidden preference is stored for multiple sessions, so this is the preferred way to store these preferences for a user.
  • For unauthenticated users, default to Tesseract.
  • Are we providing any UI component to the issue to let them know which engine is defaulted? Yes the hyphen default!

Event Timeline

Figma explorations here

Open questions for engineers:

  • How much more expensive would it be to keep track of whether or not someone has set up their default?
  • How much more expensive would it be to add helpful ux copy around why you would choose one OCR over the other?

PLACEHOLDER COPY FOR ILLUSTRATIVE PURPOSES


Note that in this view, Advanced Options leads to the wrm cloud

ifried added a subscriber: nayoub.
ifried renamed this task from Wikisource OCR: Selection of engine for OCR button experience [placeholder] to Wikisource OCR: Selection of engine for OCR button experience.May 5 2021, 8:34 PM
ifried updated the task description. (Show Details)
ifried updated the task description. (Show Details)
ifried added a subscriber: SGill.

I think @SGill's idea is great! If the community can inform us on which engine is more suited for their wikis (considering language/script), it'd be ideal to have that one be the default for each of them. (The user could always opt-out by selecting another engine by clicking the down arrow part of the button)

ldelench_wmf set the point value for this task to 5.May 6 2021, 5:34 PM
ldelench_wmf moved this task from To Be Estimated/Discussed to Estimated on the Community-Tech board.

Change 690212 had a related patch set uploaded (by Samwilson; author: Samwilson):

[mediawiki/extensions/Wikisource@master] Add OCR settings dialog for choosing engine

https://gerrit.wikimedia.org/r/690212

Samwilson added a subscriber: Samwilson.

I've had a first crack at this, but with a dialog window rather than a customized dropdown. This can of course be changed, but it was the simplest thing to do to get started. Also, it sounds like we will at some point want to add other settings, such as language etc.

The easiest thing to do for the dropdown menu is to use a ButtonMenuSelectWidget, which can contain MenuOptionWidgets. These options widgets are not complicated things, and can't easily contain radio buttons or anything, so we'd have to do something more custom.

Here's what the above patch looks like:

The Figma examples differ with the presence of the radio buttons and layout of their labels (with descriptions or not). Which is the correct design?

This is what it currently looks like:

@nayoub Is the text for 'fastest option' and 'better for multi-column text' final and correct?

@Samwilson sorry I missed this. Here's the updated version of the dropdown:

  • description label: "Select your default transcriber tool"
  • radio buttons with 'Tesseract OCR' and 'Google OCR', including "Recommended by your community" below the one that is the most appropriate for each Wikisource.

Visual Reference:

Thanks @nayoub that's great.

We don't yet have any way for a community to record which engine they prefer, so I think for the first patch it'll have to just be hard coded (i.e. the status quo, Tesseract) and then we can add the additional functionality it in a 2nd patch.

Change 690212 merged by jenkins-bot:

[mediawiki/extensions/Wikisource@master] Add OCR settings menu for choosing engine

https://gerrit.wikimedia.org/r/690212

Samwilson added a subscriber: MusikAnimal.

The bulk of this is done and ready for QA. @MusikAnimal raised two points about layout being wrong, one for RTL (which I'll make a follup-up for) and one for font-size in the button (which I'm not able to replicate – could it have been a cache issue?).

Testing Performed:

  • Feature shows based on visual reference; Tesseract is selected and can be changed.
  • Feature is browser compatible - Firefox, Chrome, Safari, IE
  • OCR button show properly(right to left display) when user interface langue is Arabic/Hebrew
  • User selected text extractor tool preference persist as default until changed by user
  • Text is extracted from image when user clicks the Extract button

Test link: https://en.wikisource.beta.wmflabs.org/wiki/Index:Wind_in_the_Willows_(1913).djvu

Wikisource Version: – (401fd61) 12:49, 22 June 2021 GPL-2.0-or-later

Observations:
Extraction of image is to English only although user preferred language is different(e.g Dutch, Arabic). Is that expected

Extraction of image is to English only although user preferred language is different(e.g Dutch, Arabic). Is that expected

Yep, it uses the wiki's content language, so for Beta it is always English. Switching languages (or a page with multiple languages) is handled via the Advanced Options.