Page MenuHomePhabricator

Enable new OCR UI on Beta Wikisource
Closed, ResolvedPublic1 Estimated Story Points

Description

The new button-on-image UI can now be enabled on Beta Wikisource, for easier testing.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 685643 had a related patch set uploaded (by Samwilson; author: Samwilson):

[operations/mediawiki-config@master] Enable Wikimedia OCR on Beta Wikisource

https://gerrit.wikimedia.org/r/685643

I've scheduled this for the European mid-day backport window, ~3 hours from now.

Change 685643 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable Wikimedia OCR on Beta Wikisource

https://gerrit.wikimedia.org/r/685643

It's deployed (demo) but there's a CSP error:

Content Security Policy: The page’s settings blocked the loading of a resource at https://ocr-test.wmcloud.org/api.php?engine=tesseract&langs[]=&image=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Fthumb%2Fb%2Fbd%2FWar_and_Peace.djvu%2Fpage13-1024px-War_and_Peace.djvu.jpg&uselang=en (“default-src”).

I'm guessing we have to add $out->getCSP()->addDefaultSrc( $this->toolUrl ); to \MediaWiki\Extension\Wikisource\HookHandler\EditPageShowEditFormInitialHandler.

Change 688590 had a related patch set uploaded (by Samwilson; author: Samwilson):

[mediawiki/extensions/Wikisource@master] Add OCR tool URL as a CSP default-src

https://gerrit.wikimedia.org/r/688590

Change 688590 merged by jenkins-bot:

[mediawiki/extensions/Wikisource@master] Add OCR tool URL as a CSP default-src

https://gerrit.wikimedia.org/r/688590

dom_walden subscribed.

I have experimented with a number of books on beta in different languages. For example:

(It does not seem to matter whether or not you are logged in).

wikimedia-ocr_ui.png (866×1 px, 673 KB)

Note that on smaller screens you might find the "Extract text" button obscures some of the text on an image. I don't think this matters because the images are "slippy" and you can move them around freely (and zoom in and out).

If you click the OCR button it will overwrite any text in the edit box. If you have made edits (but not published them) you could lose them without warning. I don't know if we can check if there are any unsaved changes and warn people when they click the OCR button.

I have not been able to test error handling. I will try to think about how I can do this, and come back to it.

I mostly tested on Firefox 78, but also briefly on IE11, Safari 14 and Safari 12.

Test Environment:

One way to get an error could come from sending an overly-large image to Google, e.g. https://en.wikisource.beta.wmflabs.org/w/index.php?title=Page:Nippon_Times_1945-10-08_p1.jpg&action=edit&redlink=1 (I've changed the 'width' parameter in the Index page to 3000px). Although, I'm not sure what Google's limit is these days; it used to be quite small.