Page MenuHomePhabricator

ws_ocr_daemon is not running on German Wikisource
Open, Needs TriagePublicBUG REPORT

Description

The Wikisource internal OCR functionality is not working for more than one week.

List of steps to reproduce (step by step, including full links if applicable):

What happens?:
After trying to get OCR for the Image - no text returns - the browser shows an error:

'ws_ocr_daemon is not running. Please try again later'

What should have happened instead?:
ocr daemon return full text of the displayed image.

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc.:
Tryed it in Firefox, Chrome, Edge.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Those two buttons use the phetools OCR system, which is no longer maintained.

An alternative is the Wikimedia OCR system, which is the larger OCR button at the right side of the toolbar:

Creating-Seite-Kurze-Lebens-Notizen-zu-der-Portrait-Gallerie-merkwürdiger-Luzerner-auf-der-Bürgerbibliothek-in-Luzern-pdf-28-Wikisource.png (766×1 px, 333 KB)

Does the OCR output of that work okay?

thanks for your respond @Samwilson - the result of the Wikimedia OCR looks fine.

Is there a way or is it even planned to remove the old non-working OCR buttons from the toolbar?

Is there a way or is it even planned to remove the old non-working OCR buttons from the toolbar?

https://de.wikisource.org/wiki/Spezial:Einstellungen#mw-prefsection-gadgets lists an item "OCR Buttons deaktivieren". Per https://de.wikisource.org/wiki/MediaWiki:Gadgets-definition its code is at https://de.wikisource.org/wiki/MediaWiki:Gadget-PR-ocr.js . That one line of code implies that it comes from ProofreadPage maybe? I guess that something (ProofreadPage?) loads https://wikisource.org/w/index.php?title=MediaWiki:OCR.js which loads phetools.

Those two buttons use the phetools OCR system, which is no longer maintained.

While it's correct to say the phetools are no longer maintained as such (Phe is no longer active), I have access to the tool and try to give it a little care now and then. In particular, the OCR service had stopped due to infrastructure work affecting Toolforge and I have now restarted it so it should be working again. Feel free to ping me if there's an issue with any of them (I just can't guarantee any particular response time).

That being said, @Mfchris84: while some people have a preference for the phetools-based OCR (the OCR and Fractur OCR buttons in the editor toolbar), for most people the new OCR tool (the "Transcribe Text" button over on the right) will give at least acceptable (and often superior) results. You may want to poll your local community about whether you want to keep the old tools available, or remove them entirely and just use the new OCR tool. The old tools are only supported "ad hoc" and "best effort", while the new tool is actually supported by Community Tech (WMF), and having both available will be confusing for a lot of users.

And to be clear, the buttons for the old tools are provided by local Gadgets that are under the control of the community (admins can edit them), so having them or removing them is a community decision.

Is there a way or is it even planned to remove the old non-working OCR buttons from the toolbar?

https://de.wikisource.org/wiki/Spezial:Einstellungen#mw-prefsection-gadgets lists an item "OCR Buttons deaktivieren". Per https://de.wikisource.org/wiki/MediaWiki:Gadgets-definition its code is at https://de.wikisource.org/wiki/MediaWiki:Gadget-PR-ocr.js . That one line of code implies that it comes from ProofreadPage maybe? I guess that something (ProofreadPage?) loads https://wikisource.org/w/index.php?title=MediaWiki:OCR.js which loads phetools.

For future reference: there are multiple old OCR tools that are backed by a Toolforge service and show up on-wiki as buttons in the editor toolbar (2010 Wikitext editor, not VE). These are provided by local Gadgets on many language Wikisources (some of which cross-load code from Multilingual Wikisource). These are "OCR for Proofread Page", not "OCR provided by Proofread Page", and are under community control. The backend on Toolforge (phetools) is mostly unmaintained, but I have access and kick it when needed (feel free to ping me for related issues, probably best on-wiki, and I can't guarantee response time).

In addition, Community Tech created a new OCR tool, with a backend hosted in WMF production, implemented as part of the Wikisource extension (common code for all the Wikisourcen). This shows up as a custom button and other UI to the right of the normal editor toolbar, and is not under community control (it can't be disabled or remove like a Gadget can). Most users should be using this tool, and most Wikisourcen should disable the old Gadgets, but on enWS we still have them available because some users prefer them (strongly) and in some cases they provide better results.