Page MenuHomePhabricator

OCR language selection does not work
Closed, ResolvedPublicBUG REPORT

Description

The new on-wiki UI for selecting OCR langauges (add in T279405) is not displaying the languages for any engine. For example, for Transkribus the dropdown is empty:

Screenshot_20240218_122820.png (382×415 px, 61 KB)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 1004318 had a related patch set uploaded (by Samwilson; author: Samwilson):

[mediawiki/extensions/Wikisource@master] [WIP] Fetch OCR languages via the job queue

https://gerrit.wikimedia.org/r/1004318

There are errors in the logs such as:

[16-Feb-2024 07:12:13] WARNING: [pool www] child 9, script '/srv/mediawiki/docroot/wikisource.org/w/load.php' (request: "GET /w/load.php?debug=1&lang=en-gb&modules=ext.wikisource.OCR") executing too slow (6.638146 sec), logging

[17-Feb-2024 16:45:25] WARNING: [pool www] child 1096850, script '/srv/mediawiki/docroot/wikisource.org/w/load.php' (request: "GET /w/load.php?lang=en&modules=ext.wikisource.OCR&skin=vector&sourcemap=1&version=4d6cm") executing too slow (6.333476 sec), logging

I'm guessing the remote API fetch to e.g. https://ocr.wmcloud.org/api/available_langs?engine=google is what's causing the slow down.

The code in the Wikisource extension doesn't work at all for me in production with eval.php. $wgHTTPProxy is not set (refer T298264) and wmcloud IPs are not routable from production servers. So every extension needs its own proxy configuration, passing a proxy option to HttpRequestFactory::create().

Thank you! I'd totally forgotten about that. We had the same issue with Phonos, and ended up adding $wgPhonosApiProxy = $wgCopyUploadProxy; to reuse the existing config. Sounds like the same thing will work for this.

Change 1005398 had a related patch set uploaded (by Samwilson; author: Samwilson):

[mediawiki/extensions/Wikisource@master] OCR: Add HTTP proxy config

https://gerrit.wikimedia.org/r/1005398

Change 1005399 had a related patch set uploaded (by Samwilson; author: Samwilson):

[mediawiki/extensions/Wikisource@master] OCR: Use MainObjectStash instead of MainWANObjectCache for languages

https://gerrit.wikimedia.org/r/1005399

Change 1004318 abandoned by Samwilson:

[mediawiki/extensions/Wikisource@master] Fetch OCR languages via the job queue

Reason:

No need for a job after all.

https://gerrit.wikimedia.org/r/1004318

Change 1005434 had a related patch set uploaded (by Samwilson; author: Samwilson):

[operations/mediawiki-config@master] CommonSettings: Set $wgWikisourceHttpProxy

https://gerrit.wikimedia.org/r/1005434

Change 1005435 had a related patch set uploaded (by Samwilson; author: Samwilson):

[operations/mediawiki-config@master] InitializeSettings: Add Wikisource logging channel to prod and labs

https://gerrit.wikimedia.org/r/1005435

The service time for this API from eqiad via url-downloader is around 30-35ms.

Change 1005399 abandoned by Samwilson:

[mediawiki/extensions/Wikisource@master] OCR: Use MainObjectStash instead of MainWANObjectCache for languages

Reason:

Not needed.

https://gerrit.wikimedia.org/r/1005399

Change 1005398 merged by jenkins-bot:

[mediawiki/extensions/Wikisource@master] OCR: Add HTTP proxy config

https://gerrit.wikimedia.org/r/1005398

Change 1005700 had a related patch set uploaded (by Tim Starling; author: Samwilson):

[mediawiki/extensions/Wikisource@wmf/1.42.0-wmf.19] OCR: Add HTTP proxy config

https://gerrit.wikimedia.org/r/1005700

Change 1005700 merged by jenkins-bot:

[mediawiki/extensions/Wikisource@wmf/1.42.0-wmf.19] OCR: Add HTTP proxy config

https://gerrit.wikimedia.org/r/1005700

Change 1005434 merged by Tim Starling:

[operations/mediawiki-config@master] CommonSettings: Set $wgWikisourceHttpProxy

https://gerrit.wikimedia.org/r/1005434

Change 1005435 merged by jenkins-bot:

[operations/mediawiki-config@master] InitializeSettings: Add Wikisource logging channel to prod and labs

https://gerrit.wikimedia.org/r/1005435

tstarling claimed this task.