Page MenuHomePhabricator

Wikisource OCR: determine staging tool for wikimedia ocr [4H]
Closed, ResolvedPublic


As a product manager, I want a functional staging tool for Wikimedia OCR, so it can be tested by various stakeholders over the course of the OCR Improvements project.

Acceptance Criteria:

  • Determine what would be the best staging tool for Wikimedia OCR
  • Best defined as: best in terms of technical efficiency/manageability AND ability for stakeholders to easily test
  • Provide a proposal/next steps & share with team

Event Timeline

ifried updated the task description. (Show Details)
ARamirez_WMF renamed this task from Wikisource OCR: determine staging tool for wikimedia ocr to Wikisource OCR: determine staging tool for wikimedia ocr [4H].Mar 25 2021, 5:36 PM

I suggest we start with a new tool for staging (which I've actually already created, because I saw the other task before this one).

Let's set up the tool at and get it running with Wikisource and the new codebase, and automatic deployment etc.

Then, later, if we decide it either needs more resources or when we come to add Tesseract support, we can move it to a VPS.

Wherever the tool lives, its latest code needs to be testable from the Wikisource side, where we'll probably be adding functionality to ProofreadPage.

This could be done by adding this functionality as a beta feature, where if it's enabled it'll use the test instance of the tool. One problem with that is that it'll become confusing when we want to release the actual code: if the beta-feature is enabled, it'll use the beta tool, and if the beta feature is disabled the feature will still exist but it'll use the prod tool.

So a URL parameter might be good, in conjunction with a beta feature:

  • If the beta feature is disabled (the default) there's nothing;
  • If the beta feature is enabled, the OCR UI appears and uses the production tool;
  • If the beta feature is enabled and the prp-ocr-test URL parameter (or whatever) is set, it'll use the test tool.

We're probably going to be making changes to the UI that depend on changes to the tool, so we'll want to make sure we handle deployments (for both staging and prod) carefully.

Oh, scratch that: Toolforge is still on PHP 7.2, and our new code requires 7.3, so I guess we do need a VPS.

We discussed this in the engineering meeting today, and suggest the following:

  • No Beta feature.
  • Include the wiki UI in the ProofreadPage extension (not the Wikisource extension).
  • Feature flag $wgProofreadPageOcrEnabled, overridable with a URL parameter such as ?prp_ocr=1. The feature will be off by default
  • A config var $wgProofreadPageOcrEndpoint will point to our OCR API, with a different value for Beta Wikisource.

Does this sound okay?

@dmaza you might have an idea about using an HTTP header to make testing easier; can you elaborate here?