Page MenuHomePhabricator

OCR scripts need updating at tools labs by updating the "tesseract-ben" package
Closed, ResolvedPublic

Description

A new version of Bengali Train data released just 4 month before. Could any one update for Bengali Wikisource?

https://github.com/tesseract-ocr/tessdata/blob/master/ben.traineddata

Last time Uer:Phe update/improve bengali OCR, https://bn.wikisource.org/wiki/user_talk:Jayantanth#OCR, but now OCR Button not appeared at edit mode.

Event Timeline

jayantanth raised the priority of this task from to Medium.
jayantanth updated the task description. (Show Details)
jayantanth subscribed.
jayantanth set Security to None.
jayantanth updated the task description. (Show Details)

Not an bug with the Proofread Extension, but a bug regarding an gadget called OCR.js and the OCR tool at labs, which is being used on most wikisource wikis.

Regarding the problem of not having an OCR button, the source of that issue is this edit - as the link to said OCR.js gadget was commented out, and no longer works because of that.
Jayantanth should remove the two slashes at the start of line 27 in Mediawiki:Common.js and the OCR button should show up again.

I can not comment on the other issue mentioned in this thread - which is the lack of updates to the bengali OCR in the labs OCR tool. That issue can only be solved by Phe or Tpt, as they are the maintainers.

Aklapper renamed this task from Bengali OCR for Proofread Page to Common.js on bn.wikisource.org breaks OCR for Proofread Page.Nov 5 2015, 10:38 AM
Aklapper raised the priority of this task from Medium to Needs Triage.

The javascript file has been fixed at bnWS. The update of the tools labs library is an update needed at tools, and I am not sure whether that is for @Phe himself, or something that needs more generally updating.

Billinghurst renamed this task from Common.js on bn.wikisource.org breaks OCR for Proofread Page to OCR scripts need updating at tools labs.Nov 5 2015, 10:47 AM

Tool Labs has these files installed via apt-get: http://packages.ubuntu.com/trusty/tesseract-ocr-ben , but the last release was in 2012 (!). More recent debian releases do have newer versions (e.g. https://packages.debian.org/source/sid/tesseract-ben).

Aklapper renamed this task from OCR scripts need updating at tools labs to OCR scripts need updating at tools labs by updating the "tesseract-ben" package.Nov 26 2015, 12:55 PM

Tpt updated during Wikimania hackathon

Bodhisattwa claimed this task.