Page MenuHomePhabricator

Testing of new OCR widget on Bengali Wikisource
Closed, ResolvedPublic

Description

This ticket will document the test result of the new OCR widget enabled on Bengali Wikisource as part of T283898. The issues will be tabulated eventually as the improvements will follow with time.

  1. No drop down menu to select OCR engine
  2. Jargon output in Latin script - Test page - https://bn.wikisource.org/s/6bee
    Screenshot from 2021-06-09 10-45-24.png (832×1 px, 665 KB)
  3. can't recognize and extract RTL scripts when that is mixed with Bengali script - https://bn.wikisource.org/s/gu1q -
    Screenshot from 2021-06-10 09-40-09.png (903×1 px, 1 MB)
  4. can't OCR correctly from more than one columns in a page - https://bn.wikisource.org/s/1jle -
    Screenshot from 2021-06-10 09-46-32.png (905×1 px, 1 MB)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
  1. No drop down menu to select OCR engine
  2. Jargon output in Latin script - Test page - https://bn.wikisource.org/s/6bee

I think these will be fixed with T281769, once it goes to production.

For point 2 above, with mixed scripts it seems to work pretty well if a list of languages is provided, so that should be maybe okay once we've got the 'advanced form' link in place (or even better, added a language-selection).

@Bodhisattwa do you know if there is there more work to be done to resolve this?

@Bodhisattwa do you know if there is there more work to be done to resolve this?

I guess its totally ok to close this ticket.

TheresNoTime claimed this task.
TheresNoTime subscribed.

I guess its totally ok to close this ticket.

Thanks for clarifying @Bodhisattwa — closing 🙂