Page MenuHomePhabricator

OCR is not working for Devanagari script in wikisource.org
Open, Needs TriagePublic

Description

When I try to transcribe the Devnagari script book in multilingual Wikisource ( https://wikisource.org/), the OCR is not working.

You can check the below link
https://wikisource.org/w/index.php?title=Page:एका_विचाराची_जिवीत-कथा.pdf/81&action=edit&redlink=1, and click the "Transcribe text"

Event Timeline

Unfortunately, Multilingual Wikisource doesn't make it easy for the OCR script to send the right language to the OCR engine.

Leaving out the language (via the 'Advanced Options' form), gives better results, e.g. see here.

I think the fix here is to a) not send 'en' as the language for any pages on Multilingual Wikisource; and b) work on T279405, so there's a way to choose which language is used.

In the meantime, a workaround is to use the 'Advanced Options'.

sweil renamed this task from OCR is not working for Devnagari script in wikisource.org to OCR is not working for Devanagari script in wikisource.org.Aug 16 2023, 5:21 AM