Page MenuHomePhabricator

Clean up OCR gadgets
Closed, InvalidPublic3 Estimated Story Points

Description

Acceptance criteria:

  • Once the improvements are deployed to production, there will be no need for the gadgets. The scope of this ticket will be to remove the old gadget once we deploy so we may clean up tech debt.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

This task is to remove the ocr and GoogleOCR gadgets from Wikisources' MediaWiki:Gadgets-definition pages.

Some also have other OCR gadgets, probably most notably IndicOCR (e.g., which I think is a fork of the Google one). I guess we'll leave this alone?

Or should we only be removing the GoogleOCR one, because we created it? And leaving the others to communities to disable.

Also, is there a ticket for deploying to production?

I am not sure if the gadgets should be removed. The gadgets provide OCR option from the edit toolbar and editors are accustomed to it while editing. Removing them might hamper the flow.

I am not sure if the gadgets should be removed. The gadgets provide OCR option from the edit toolbar and editors are accustomed to it while editing. Removing them might hamper the flow.

If it wasn't clear, we are offering a new "Extract text" button that appears over the image your transcribing. This does the same thing as the gadgets. Do you think users of the gadget will get accustomed to using the new button?

I don't see an issue with keeping the gadgets around if communities really want to, but we should at least update the Tesseract gadgets to point to the new Wikimedia OCR tool which is 10x faster than the old one. https://ws-google-ocr.toolforge.org/ meanwhile can be redirected to the new tool (and appending engine=google to the URL), so no changes will need to be made to that gadget. I also think these gadgets should not be on by default, if they are configured that way in the gadget definition. While experienced users may want to keep the buttons in the toolbar, new users should have no problem finding the new OCR button and the redundant buttons in the editing toolbar might be confusing.

Some also have other OCR gadgets, probably most notably IndicOCR (e.g., which I think is a fork of the Google one). I guess we'll leave this alone?

We don't offer Indic OCR, so removing that one should be left to each community to decide, I think. The example you gave uses https://indic-ocr.toolforge.org

You're right, I was forgetting: Indic OCR uses the Google Drive API, not Cloud Vision.

I am not sure if the gadgets should be removed.

And I am sure if the gadgets shouldn't be removed. Let's users to choose themself which tool they want to use. Hitherto I didn't see a lot of delighted people in new tool.

If it wasn't clear, we are offering a new "Extract text" button that appears over the image your transcribing.

It was clear. Did you check how many users have deactivated it already?

FYI, both gadgets are already not default on enWS. This actually surprised me as I thought the normal one was default on.

Is it possible for toolbar button to call into the extension JS so the flow is the same except for the actual element that is clicked?

And, as I suggested at the enWS Scriptorium, putting a toggle button next to the header/footer button in the PRP toolbar seems like a logical place for a quick flip switch. Certainly in works with already good OCR, the user might reasonably not want to see it temporarily.

I suggest to default to on, so users know it is there, and tell them how to toggle it in the initial help bubble.

If it wasn't clear, we are offering a new "Extract text" button that appears over the image your transcribing. This does the same thing as the gadgets.

It was clear as the Extract text button has already been implemented in wikis.

I also think these gadgets should not be on by default, if they are configured that way in the gadget definition.

IMHO, let the communities decide if they want to keep or remove their OCR gadgets or if they want some of their gadgets as default. There are various reasons why some OCR buttons are preferred over others on a certain language Wikisource.

While experienced users may want to keep the buttons in the toolbar, new users should have no problem finding the new OCR button and the redundant buttons in the editing toolbar might be confusing.

Maybe, but the position of the buttons in toolbar is such that it helps users to easily find them, even for the newcomers.

If it wasn't clear, we are offering a new "Extract text" button that appears over the image your transcribing.

It was clear. Did you check how many users have deactivated it already?

I was replying to T283897#7168450 which I believe was from before we deployed to all Wikisources.

Is it possible for toolbar button to call into the extension JS so the flow is the same except for the actual element that is clicked?

That sounds fine to me. However, you'd lack the ability to go to our "Advanced Tools" page which can help some people depending on their needs. But yes, at the very least the gadgets should use our new, much faster, and actively maintained backends.

And, as I suggested at the enWS Scriptorium, putting a toggle button next to the header/footer button in the PRP toolbar seems like a logical place for a quick flip switch. Certainly in works with already good OCR, the user might reasonably not want to see it temporarily.

I suggest to default to on, so users know it is there, and tell them how to toggle it in the initial help bubble.

I think this is basically what @Daimona was talking about and I agree a simple way to toggle on/off the OCR tools in the UI (without having to go to Special:Preferences) seems worth exploring.

a simple way to toggle on/off the OCR tools in the UI (without having to go to Special:Preferences)

This could be similar to ProofreadPage's header/footer toggle: only if WikiEditor is not installed is the preference shown in Special:Preferences; otherwise, the button's state is saved as a preference. The button is also not in the top level of the toolbar, which saves a bit of space.

The "ocr" gadget (the one backed by phetools on Toolforge) is a community tool that's outside CommTech's remit to fiddle with (unless asked).

The GoogleOCR gadget is, aiui, a CommTech project (at least the backend is, but I think the frontend too) so that one is within the scope of usual calculations.

And in this particular case I 1) don't see that the old GoogleOCR has any advantages over the new, and 2) it would be unreasonable to try to maintain two implementations (local gadget + extension) of literally the same function without a very strong reason. (but retaining two UI elements for the same function is a different calculation)

IOW, mitts off the phetools gadget (and other community tools), but do remove the old GoogleOCR one (don't just redirect it). The announcement of the new tool should probably encourage the local community to reassess what tools they want to keep and what to deprecate.

Is it possible for toolbar button to call into the extension JS so the flow is the same except for the actual element that is clicked?

That sounds fine to me. However, you'd lack the ability to go to our "Advanced Tools" page which can help some people depending on their needs.

Is there any reason that can't be a separate toolbar button? The "OCR" text with a little gear symbol superimposed, or whatever?

If the UI for the advanced options was changed to a in-page popup/dialog instead of linking out to a separate web page, that button and the hovering-popup-dropdown (the new button) would even have identical behaviour for the end user (navigating to a new webpage from a toolbar button would be extremely jarring otherwise).

The toolbar is the established UI for this kind of stuff, and where other similar functions will be found going forward. The floater on the image and the throbbing onboarding thingy are great for drawing notice to the function for new users, but it is inconsistent with other UI and gets in the way of the scan image for everyone else. Being able to at least turn the floater off but still having access to the functionality seems like a reasonable tradeoff between the concerns of the various kinds of users.

And trust me on this, training new users to find the OCR function is not the hardest part of onboarding new users to Wikisource.

Thanks for all the feedback, @Xover!

The "ocr" gadget (the one backed by phetools on Toolforge) is a community tool that's outside CommTech's remit to fiddle with (unless asked).

I guess I was under the false impression that the plan to consolidate the gadgets into a single experience was something the community wanted. It seems in line with wish, which was about creating a WMF-maintained tool that doesn't suffer from the problems seen from existing solutions.

We can certainly leave the Tesseract gadgets alone. But I do hope the performance benefits of using the new backend are understood. The new backend is an order of magnitude faster (~4 seconds versus ~40 in some of my testing), and we're about to experiment upgrading to Tesseract 5 which should be even faster and have better indic support.

It sounding like this task should be put on hold pending further community input. I see no major issue with keeping both the toolbar buttons and the new "Transcribe button" (that design may be subject to change), but from a user standpoint it would seem subpar for these two systems to be fundamentally different, be it in performance or quality of transcriptions. I think everything should at least point to the same backend.

I guess I was under the false impression that the plan to consolidate the gadgets into a single experience was something the community wanted.

"The community" is about as unified as you would imagine. In terms of OCR tools everyone wants a well supported and technically perfect one. But their definition of "perfect" differs. In particular, we know the phetools-backed OCR gadget has custom post-processing of OCR output and may use customised language (traineddata) files. The same strong individual preference that led some to exclusively use the phetools gadget over the old GoogleOcr gadget may still turn out to lead them to do the same with the new gadget, and I'd rather not pull the rug out from those users without a nice gentle transition period.

Making the new tool enabled by default (which none of the existing ones are at enWS, and shouldn't be on any other project due to the privacy policy and using non-WMF-production backends) and some gentle nudges to switch to it for existing users should do most of the job. Each project can then deprecate and eventually disable their old gadgets as and when they feel comfortable. On enWS I imagine we'll keep one or two such gadgets around indefinitely, as non default, and hidden way down at the bottom of #prefsection-gadgets for special cases. Other projects may handle it differently.

The new backend is an order of magnitude faster (~4 seconds versus ~40 in some of my testing)

Yeah, see, the problem is that phetools pregenerates OCR for the whole work and returns every page after the first requested from cache (it's a stat() and network latency; ~zero processing, so disk I/O on Toolforge's NFS filesystem is probably the biggest time component). The vast majority of requests to it will be effectively instantaneous, and the lack of a loading spinner reduces the subjectively perceived time even further. It's entirely possible you will get complaints that the new tool is too slow (in their subjective experience), even though its purely technical improvement in performance is literally tenfold.

Speaking as a enWS interface admin (caveat: without community consultation, personal opinions follow), I would rather that enWS be left to clean up the gadgets on our own, since we can handle it ourselves easily enough. If we can't, or we need help to connect the old gadget to the new backend, we know who and where to ask. :-)

I'm specifically not grumpy about the new OCR tool in any way: I think, in general, it is a great step forward, helps all Wikisourcen, reduces barriers to entry and reduces reliance on barely-maintained infrastructure.

However, as one of the Wikisourcen that have actually had a (usually) functional OCR tool for a long time and a lot of users with a lot of clicks on it, we are actually one that least needs the new tool, until there's a demonstrable benefit. And I am fully aware that there will be benefits to us, for example, page segmentation control, language selection, etc etc. But until those features land, I suspect the new tool will be a hard sell to people invested in the existing gadgets though long familiarity. And we are more able to navigate the migration process locally.

I will let other Wikisourcen speak for themselves, but I imagine that the "big boys" will have a somewhat similar ideas, and the "little guys" will be more receptive to built-in OCR as it currently stands with language selection and all without any local configuration or maintenance needed.


Thank you CommTech for your hard work up to this point: I'm excited to see it bearing fruit, and I hope to be able to make it available at enWS in a way that won't cause unnecessary drama.

@Inductiveload @TommyJantarek @Xover and @Bodhisattwa thank you for all of your input! We appreciate the level of attention, care, and detail you put into the release. We also appreciate your knowledge sharing when you come to us with your perspectives.

Thank you @Samwilson and @MusikAnimal for the Comm Tech responses

Looking over the different perspectives on the thread, I have some questions:

  • If we were to change the position of the transcribe button and instead place it in the tool bar, with a drop down, as noted in this ticket, would you still be opposed to the gadget clean up?

The communities should decide the clean up ( I apologize for the humble learning curve on my end, I thought cleaning up tech debt was the right thing to do given that we are introducing the functionality in the Transcribe Text Button but now I see the light.) Communities can organize around the clean up. That being said, it will be our recommendation that it is best to remove given that multiple parts of the codebase with the same functionality can lead to maintenance costs. (this recommendation, excluding cleaning up IndicOCR since our changes did not account for that functionality)

@NRodriguez Moving the tool to the toolbar makes more sense to me since that's 1) where people are used to it being and 2) IMO it's somewhat logical for tools to go there, and users are also used to that. Plus it's much less "in the way" of the image (though I still think that if you do keep the UI where it is, allowing to toggle it on and off - e.g. T285999 - will satisfy pretty much everyone too). Also it would be easier to position it between H/V modes (T285764) as a menu item.

RE the gadgets, that'll still be up to the communities, but for enWS specifically I imagine what we'll see is some people will continue to prefer the old "buttons" format, especially as long as the new tool doesn't add anything new. So, we might keep them for the time being. As they are, the gadget is opt-in anyway, so new users will probably go right to the new tool anyway. As I said before, we (enWS) will probably port the backends over eventually anyway, especially if the new backends are faster/more accurate/more useful. So the buttons gadget will just be a thin shell that won't incur substantial debt.

  • If we were to change the position of the transcribe button and instead place it in the tool bar, with a drop down, as noted in this ticket, would you still be opposed to the gadget clean up?

Yes, these are orthogonal issues. The communities have developed a lot of tools using the defined extension points that MediaWiki provides. These include templates, Lua modules, site-specific CSS (MediaWiki:Common.css) and JavaScript (Common.js), Gadgets, and user styles and scripts. These get added and removed all the time depending on the community's needs, are explicitly the community's remit, and the WMF does not provide support for them. The old phetools-backed OCR Gadget is one such tool.

The community usually appreciates help and support with such tools, but they generally don't appreciate it when someone external to that community steps in and makes decisions for them.

The existing GoogleOcr Gadget is, I believe, a CommTech product and can be removed by CommTech if you believe this is the best course of action. Or changed to talk to the new backend. Or replaced with the new tool with the alternate UI. The point is that as a CommTech rather than community tool this is entirely with your remit, the same way core MediaWiki features etc. are (I am deliberately ignoring the complexities related to volunteer developers contributing to MediaWiki and its extensions, and what the WMF develops vs. supports in its deployment, etc.).

I'll try to post some thoughts on the actual UI over in T285712.

  • If we were to change the position of the transcribe button and instead place it in the tool bar, with a drop down, as noted in this ticket, would you still be opposed to the gadget clean up?

On the beginning I would still be opposed by the time when the communities will test new tool and compare with old. I think (I am sure) that later the communities will decide to the clean up the gadget themselves.

Hey all, thanks for the inputs. As discussed, we will now not be cleaning up the OCR gadgets. That is outside of our purview as the community tech team. We will let communities decide individually and will be cutting a new ticket if any new work is needed to help this. Thanks for helping me understand with more clarity. Will keep this in mind going forward.