Page MenuHomePhabricator

WS OCR: should act on the OSD content, not the image element content
Open, Needs TriagePublicFeature

Description

Depending on your perspective, this could be a bug report or feature request.

Currently, the OCR is generated from the original image set by the server (as stored in the <img> element.

If the current content of the OSD viewer is different, this will not be OCRed, even though it is what the user is looking at.

Things to consider:

  • OSD can have multiple images: which one gets OCRed? Maybe the first?
  • If the new image doesn't come from the Wikimedia image domain, it will be rejected by the backend

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

There's a little mistake into issue's description. the problem is, that tool reads content of <img> element when it is built, and it doesn't read again the content of <img> tag when it is shot. One from it.wikisource tools changes dynamically <img> content (take a look to "eis", edit in sequence, gadget to understand why it happens), it successfully uploads the new image into canvas, but it can't change image's url into ocr tool.
I think that fiuxing the issue should be very easy: what is needed is simply to read again <img> current content when the tool is clicked.

There's a little mistake into issue's description. the problem is, that tool reads content of <img> element when it is built, and it doesn't read again the content of <img> tag when it is shot. One from it.wikisource tools changes dynamically <img> content (take a look to "eis", edit in sequence, gadget to understand why it happens), it successfully uploads the new image into canvas, but it can't change image's url into ocr tool.
I think that fiuxing the issue should be very easy: what is needed is simply to read again <img> current content when the tool is clicked.

We definitely want to retrieve this info from OSD. Especially with features of eis being integrated into ProofreadPage (per T308098), I'm not sure we want to use the <img> as the single source of truth for the current image.

Will work on this as part of the recent proposed changes to the OSD viewer (859160)

Change 860526 had a related patch set uploaded (by Sohom Datta; author: Sohom Datta):

[mediawiki/extensions/Wikisource@master] [Wikimedia OCR] Use OSD instead of Page image

https://gerrit.wikimedia.org/r/860526

Change 860526 merged by jenkins-bot:

[mediawiki/extensions/Wikisource@master] [Wikimedia OCR] Use OSD instead of Page image

https://gerrit.wikimedia.org/r/860526