Page MenuHomePhabricator

Wikisource OCR: Selection of engine for OCR button experience
Closed, ResolvedPublic5 Estimated Story Points

Description

Acceptance Criteria:

  • The default OCR is Tesseract for all users.
  • IF the user wants to change the selected their default selection, the following behavior will occur:
    • The user can click "extract" and the OCR rendering will occur with their default OCR engine
    • The user can change the OCR engine by clicking on the down arrow icon
  • In the dropdown, display a final option entitled "Advanced options," which:
    • Links to the wmcloud OCR form (which would be useful for advanced users)
    • Ideally, pre-populate the link to the Commons image in the form page
  • Question from @SGill: Can wikis have a default engine that they have specified or have an engine that is recommended (which is best for the language, the wiki's needs)?

Idea: Maybe we give option to communities to select default, but all users can choose to switch to another one in the proofread page if they want (or in bulk, if/when that is available).

Visual Reference

Final.png (960×1 px, 63 KB)

Documentation of technical considerations:

  • Hidden preference is stored for multiple sessions, so this is the preferred way to store these preferences for a user.
  • For unauthenticated users, default to Tesseract.
  • Are we providing any UI component to the issue to let them know which engine is defaulted? Yes the hyphen default!

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Figma explorations here

Open questions for engineers:

  • How much more expensive would it be to keep track of whether or not someone has set up their default?
  • How much more expensive would it be to add helpful ux copy around why you would choose one OCR over the other?

PLACEHOLDER COPY FOR ILLUSTRATIVE PURPOSES

image.png (337×522 px, 19 KB)

image.png (286×270 px, 17 KB)

Note that in this view, Advanced Options leads to the wrm cloud

ifried added a subscriber: nayoub.
ifried renamed this task from Wikisource OCR: Selection of engine for OCR button experience [placeholder] to Wikisource OCR: Selection of engine for OCR button experience.May 5 2021, 8:34 PM
ifried updated the task description. (Show Details)
ifried updated the task description. (Show Details)
ifried added a subscriber: SGill.

I think @SGill's idea is great! If the community can inform us on which engine is more suited for their wikis (considering language/script), it'd be ideal to have that one be the default for each of them. (The user could always opt-out by selecting another engine by clicking the down arrow part of the button)

Change 690212 had a related patch set uploaded (by Samwilson; author: Samwilson):

[mediawiki/extensions/Wikisource@master] Add OCR settings dialog for choosing engine

https://gerrit.wikimedia.org/r/690212

Samwilson subscribed.

I've had a first crack at this, but with a dialog window rather than a customized dropdown. This can of course be changed, but it was the simplest thing to do to get started. Also, it sounds like we will at some point want to add other settings, such as language etc.

The easiest thing to do for the dropdown menu is to use a ButtonMenuSelectWidget, which can contain MenuOptionWidgets. These options widgets are not complicated things, and can't easily contain radio buttons or anything, so we'd have to do something more custom.

Here's what the above patch looks like:

Creating-Page-Testdoc-pdf-2-Dev-wiki1.png (239×782 px, 36 KB)

The Figma examples differ with the presence of the radio buttons and layout of their labels (with descriptions or not). Which is the correct design?

This is what it currently looks like:

ocr-popup-menu.png (222×365 px, 33 KB)

@nayoub Is the text for 'fastest option' and 'better for multi-column text' final and correct?

@Samwilson sorry I missed this. Here's the updated version of the dropdown:

  • description label: "Select your default transcriber tool"
  • radio buttons with 'Tesseract OCR' and 'Google OCR', including "Recommended by your community" below the one that is the most appropriate for each Wikisource.

Visual Reference:

Final.png (960×1 px, 63 KB)

Thanks @nayoub that's great.

We don't yet have any way for a community to record which engine they prefer, so I think for the first patch it'll have to just be hard coded (i.e. the status quo, Tesseract) and then we can add the additional functionality it in a 2nd patch.

Change 690212 merged by jenkins-bot:

[mediawiki/extensions/Wikisource@master] Add OCR settings menu for choosing engine

https://gerrit.wikimedia.org/r/690212

Samwilson added a subscriber: MusikAnimal.

The bulk of this is done and ready for QA. @MusikAnimal raised two points about layout being wrong, one for RTL (which I'll make a follup-up for) and one for font-size in the button (which I'm not able to replicate – could it have been a cache issue?).

Testing Performed:

  • Feature shows based on visual reference; Tesseract is selected and can be changed.
  • Feature is browser compatible - Firefox, Chrome, Safari, IE
  • OCR button show properly(right to left display) when user interface langue is Arabic/Hebrew
  • User selected text extractor tool preference persist as default until changed by user
  • Text is extracted from image when user clicks the Extract button

Test link: https://en.wikisource.beta.wmflabs.org/wiki/Index:Wind_in_the_Willows_(1913).djvu

Wikisource Version: – (401fd61) 12:49, 22 June 2021 GPL-2.0-or-later

Observations:
Extraction of image is to English only although user preferred language is different(e.g Dutch, Arabic). Is that expected

Extraction of image is to English only although user preferred language is different(e.g Dutch, Arabic). Is that expected

Yep, it uses the wiki's content language, so for Beta it is always English. Switching languages (or a page with multiple languages) is handled via the Advanced Options.

@Samwilson, I saw the following issues, while testing. I will most likely need to open another ticket for these issues. However, would like team advice. We can meet to review it if need. Thank you.

Screen Shot 2021-06-22 at 4.39.56 PM.png (1×2 px, 1 MB)

Screen Shot 2021-06-22 at 3.51.26 PM.png (1×2 px, 1 MB)

Screen Shot 2021-06-23 at 9.19.46 PM.png (1×2 px, 1 MB)

Thanks @imaigwilo, that's terrific (I mean, in finding the bugs, not the bugs themselves!)

I think that of these three only the scrolling break to the popup is a bug: text alignment is correct because lang="en" is set on the text box (it's the page content language); and we only support Grade C for IE10 (which means no Javascript features).

It looks like it's an issue with the hideWhenOutOfView config for the popup. When that is set to false, it looks like it works correctly. Perhaps it's okay if we don't close it when it's out of view? That would just be a workaround of course; there seems to be a bug, because it works fine in LTR languages.

Natalia which skins do you wants Wikimedia OCR to support, so we can test them accordingly? Thank you. @NRodriguez @dom_walden

Thanks Sam for taking a look at this promptly. Per your response you'll fix the scrolling break to the popup issue. I'm assuming that will be fixed on this ticket.
For the IE10 missing the extract button, I miss labeled it. I was testing on IE11 browser actually. Does that make a difference? Does it need fixing? Thank you!

Screen Shot 2021-06-23  IE11.png (1×2 px, 1 MB)

Another thing I notice is when the page is right aligned(user language changed to Arabic), on all browsers the Transcribe text button over-labs the image text. I didn't notice this when page was left aligned. Let me know if that will be fixed on another ticket. I can create one. Thank you

Screen Shot 2021-06-25 at  FireFox browser.png (581×1 px, 513 KB)

For the IE10 missing the extract button, I miss labeled it. I was testing on IE11 browser actually. Does that make a difference?

Grade A support for IE11 was dropped in March 2021, cf. this notice, for JS-based features specifically. It is also on a rapid trek towards fully Grade C based on usage numbers, and when ES6+Vue3 land it'll necessarily get completely deprecated.

I wouldn't recommend expending even a minimum of effort on fixing it for IE11, is what I'm saying. :)

For the IE10 missing the extract button, I miss labeled it. I was testing on IE11 browser actually. Does that make a difference?

Grade A support for IE11 was dropped in March 2021, cf. this notice, for JS-based features specifically. It is also on a rapid trek towards fully Grade C based on usage numbers, and when ES6+Vue3 land it'll necessarily get completely deprecated.

I wouldn't recommend expending even a minimum of effort on fixing it for IE11, is what I'm saying. :)

Thank you. We will not bother with these IE11 or IE10 issues

Skins:

  • Vector is the default one, only one to test for now

Not necessarily a hard rule, we expect folks to report if it breaks a skin and we fix them if it's broken in one of them.

@Samwilson and @imaigwilo for now, let's go with Sam's suggestion

It looks like it's an issue with the hideWhenOutOfView config for the popup. When that is set to false, it looks like it works correctly. Perhaps it's okay if we don't close it when it's out of view? That would just be a workaround of course; there seems to be a bug, because it works fine in LTR languages.

This solution is sub-optimal but works and there is a lot of work in the scope creep of the release aftermath.

Change 702479 had a related patch set uploaded (by Samwilson; author: Samwilson):

[mediawiki/extensions/Wikisource@master] Don't hide the onboarding popup when out of view

https://gerrit.wikimedia.org/r/702479

Change 702479 merged by jenkins-bot:

[mediawiki/extensions/Wikisource@master] Don't hide the OCR config popup when out of view

https://gerrit.wikimedia.org/r/702479

FTR, I've created T285912 for the OOUI bug.

@Daimona thank you!

Change 862353 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/Wikisource@master] Revert "Don't hide the OCR config popup when out of view"

https://gerrit.wikimedia.org/r/862353

Change 862353 merged by jenkins-bot:

[mediawiki/extensions/Wikisource@master] Revert "Don't hide the OCR config popup when out of view"

https://gerrit.wikimedia.org/r/862353