WS OCR: should act on the OSD content, not the image element content
Open, Needs TriagePublicFeature
Actions

Assigned To

Authored By

	Inductiveload
	Jan 5 2022, 10:49 PM

Description

Depending on your perspective, this could be a bug report or feature request.

Currently, the OCR is generated from the original image set by the server (as stored in the <img> element.

If the current content of the OSD viewer is different, this will not be OCRed, even though it is what the user is looking at.

Things to consider:

OSD can have multiple images: which one gets OCRed? Maybe the first?
If the new image doesn't come from the Wikimedia image domain, it will be rejected by the backend

Details

	Subject	Repo	Branch	Lines +/-
	[Wikimedia OCR] Use OSD instead of Page image	mediawiki/extensions/Wikisource	master	+63 -14

Customize query in gerrit

Related Objects

Mentioned In: T324740: Wikimedia OCR fails with 400 status
T308098: Integrate edit-in-sequence inside ProofreadPage
T288141: ProofreadPage: use OpenSeadragon for the Page NS image viewer
Mentioned Here: T308098: Integrate edit-in-sequence inside ProofreadPage

Event Timeline

Inductiveload created this task.Jan 5 2022, 10:49 PM

Restricted Application added a project: Community-Tech. · View Herald TranscriptJan 5 2022, 10:49 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Inductiveload mentioned this in T288141: ProofreadPage: use OpenSeadragon for the Page NS image viewer.Jan 5 2022, 10:51 PM

Ruthven subscribed.Jan 12 2022, 3:08 PM

ldelench_wmf removed a project: Community-Tech.Jan 31 2022, 3:24 PM

There's a little mistake into issue's description. the problem is, that tool reads content of <img> element when it is built, and it doesn't read again the content of <img> tag when it is shot. One from it.wikisource tools changes dynamically <img> content (take a look to "eis", edit in sequence, gadget to understand why it happens), it successfully uploads the new image into canvas, but it can't change image's url into ocr tool.
I think that fiuxing the issue should be very easy: what is needed is simply to read again <img> current content when the tool is clicked.

Ruthven added a project: All-and-every-Wikisource.May 4 2022, 3:57 PM

In T298663#7665215, @Alex_brollo wrote:

There's a little mistake into issue's description. the problem is, that tool reads content of <img> element when it is built, and it doesn't read again the content of <img> tag when it is shot. One from it.wikisource tools changes dynamically <img> content (take a look to "eis", edit in sequence, gadget to understand why it happens), it successfully uploads the new image into canvas, but it can't change image's url into ocr tool.
I think that fiuxing the issue should be very easy: what is needed is simply to read again <img> current content when the tool is clicked.

We definitely want to retrieve this info from OSD. Especially with features of eis being integrated into ProofreadPage (per T308098), I'm not sure we want to use the <img> as the single source of truth for the current image.

Soda mentioned this in T308098: Integrate edit-in-sequence inside ProofreadPage.Nov 22 2022, 4:42 AM

Will work on this as part of the recent proposed changes to the OSD viewer (859160)

Change 860526 had a related patch set uploaded (by Sohom Datta; author: Sohom Datta):

[mediawiki/extensions/Wikisource@master] [Wikimedia OCR] Use OSD instead of Page image

https://gerrit.wikimedia.org/r/860526

gerritbot added a project: Patch-For-Review.Nov 24 2022, 10:32 AM

Change 860526 merged by jenkins-bot:

[mediawiki/extensions/Wikisource@master] [Wikimedia OCR] Use OSD instead of Page image

https://gerrit.wikimedia.org/r/860526

ReleaseTaggerBot added a project: MW-1.40-notes (1.40.0-wmf.13; 2022-12-05).Dec 2 2022, 5:00 AM

Maintenance_bot removed a project: Patch-For-Review.Dec 2 2022, 5:30 AM

Samwilson mentioned this in T324740: Wikimedia OCR fails with 400 status.Dec 8 2022, 7:36 AM

WS OCR: should act on the OSD content, not the image element contentOpen, Needs TriagePublicFeatureActions

Description

Details

Related Objects

Event Timeline

WS OCR: should act on the OSD content, not the image element content
Open, Needs TriagePublicFeature
Actions