Page MenuHomePhabricator

Wikimedia OCR: Add Symfony
Closed, ResolvedPublic5 Estimated Story Points

Description

As a Wikisource user, I want Symfony added to Wikimedia OCR so that the OCR tool can be maintained and improved in a more sustainable and up-to-date manner (with the ultimate goal of better reliability & performance).

Background: In our ebook export improvement project for Wikisource, we were able to see an improvement in reliability and performance by adding Symfony. Similarly, we also hope to see such improvements in our OCR project, so we are now doing the same work of adding Symfony.

Acceptance Criteria:

  • Add Symfony to Wikimedia OCR (formerly known as Google OCR)

Event Timeline

ifried renamed this task from Wikisource OCR: Add Symfony [placeholder] to Wikimedia OCR: Add Symfony [placeholder].Feb 25 2021, 4:07 PM
ifried renamed this task from Wikimedia OCR: Add Symfony [placeholder] to Wikimedia OCR: Add Symfony.
ifried updated the task description. (Show Details)
ARamirez_WMF set the point value for this task to 5.Feb 25 2021, 6:41 PM
ARamirez_WMF moved this task from Needs Discussion to Up Next (June 3-21) on the Community-Tech board.
dom_walden subscribed.

I have done a bit of testing of the new staging site https://ocr-test.toolforge.org/ (see also T278461#6976391).

I notice that the way we handle uncaught(?) exceptions is a bit different. Previously, we would see the exception message (truncated if it was long). Now, we return a generic The server returned a "500 Internal Server Error". with no extra details. Are we going to setup emailed exceptions like we did with WS Export? @Samwilson @MusikAnimal

For example, compare https://ws-google-ocr.toolforge.org/?image=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Fthumb%2Fa%2Fa5%2FTest_Book.pdf%2Fpage3-10&lang= with https://ocr-test.toolforge.org/?image=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Fthumb%2Fa%2Fa5%2FTest_Book.pdf%2Fpage3-10&lang=

I notice that the way we handle uncaught(?) exceptions is a bit different. Previously, we would see the exception message (truncated if it was long). Now, we return a generic The server returned a "500 Internal Server Error". with no extra details. Are we going to setup emailed exceptions like we did with WS Export? @Samwilson @MusikAnimal

For example, compare https://ws-google-ocr.toolforge.org/?image=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Fthumb%2Fa%2Fa5%2FTest_Book.pdf%2Fpage3-10&lang= with https://ocr-test.toolforge.org/?image=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Fthumb%2Fa%2Fa5%2FTest_Book.pdf%2Fpage3-10&lang=

We'll probably want to adjust how we handle exceptions, similar to what we did for WS Export (capturing common ones). I can make a follow-up PR to capture this error as well as setup the emails.

We'll probably want to adjust how we handle exceptions, similar to what we did for WS Export (capturing common ones). I can make a follow-up PR to capture this error as well as setup the emails.

I've created T279610 and T279609 to keep track of it