Wikimedia OCR: Validate the OCR engine
Closed, ResolvedPublic3 Estimated Story PointsBUG REPORT
Actions

Assigned To

Authored By

	dom_walden
	May 6 2021, 1:44 PM

Description

What is the problem?

If you enter an invalid OCR engine name, it returns a 500 error.

We should probably return a nice validation message, saying which engines we do support.

URLs to reproduce problem

https://ocr-test.wmcloud.org/?image=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2F2%2F20%2FENC_9-0379.jpg&engine=goog

Environment

Wikimedia OCR: Version 0.2.0

Related Objects

Mentioned In: T281964: Wikimedia OCR: Validation errors throw 500 exception
Mentioned Here: T284728: Uncaught PHP Exception Exception: "Engine not found: google|foo]]"

Event Timeline

dom_walden created this task.May 6 2021, 1:44 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 6 2021, 1:44 PM

dom_walden mentioned this in T281964: Wikimedia OCR: Validation errors throw 500 exception.May 6 2021, 1:45 PM

MusikAnimal claimed this task.May 7 2021, 6:35 PM

MusikAnimal edited projects, added Community-Tech (CommTech-Sprint-1); removed Community-Tech.

MusikAnimal set the point value for this task to 2.

MusikAnimal moved this task from Ready 🎬 to In Development 💻 on the Community-Tech (CommTech-Sprint-1) board.

MusikAnimal changed the point value for this task from 2 to 3.May 7 2021, 6:55 PM

PR: https://github.com/wikimedia/wikimedia-ocr/pull/30

MusikAnimal moved this task from Review/Feedback 💬 to In Development 💻 on the Community-Tech (CommTech-Sprint-1) board.May 25 2021, 4:01 AM

ldelench_wmf set Final Story Points to 5.Jun 3 2021, 5:13 PM

ldelench_wmf moved this task from CommTech-Sprint-1 to CommTech-Sprint-2 on the Community-Tech board.Jun 7 2021, 4:08 PM

ldelench_wmf edited projects, added Community-Tech (CommTech-Sprint-2); removed Community-Tech (CommTech-Sprint-1).

Restricted Application edited projects, added Community-Tech; removed Community-Tech (CommTech-Sprint-2). · View Herald TranscriptJun 7 2021, 4:08 PM

MusikAnimal edited projects, added Community-Tech (CommTech-Sprint-2); removed Community-Tech.Jun 7 2021, 4:38 PM

MusikAnimal moved this task from Ready 🎬 to In Development 💻 on the Community-Tech (CommTech-Sprint-2) board.

ldelench_wmf moved this task from Backlog to 🌟Top Priority on the Wikimedia OCR board.Jun 7 2021, 9:14 PM

I guess T284728 can be merged into this task, right?

In T282135#7149619, @Daimona wrote:

I guess T284728 can be merged into this task, right?

Yes! You found that by manually entering in an invalid engine, right? In other words, you didn't discover this through a normal user workflow?

MusikAnimal merged a task: T284728: Uncaught PHP Exception Exception: "Engine not found: google|foo]]".Jun 10 2021, 6:34 PM

In T282135#7149768, @MusikAnimal wrote:

In T282135#7149619, @Daimona wrote:

I guess T284728 can be merged into this task, right?

Yes! You found that by manually entering in an invalid engine, right? In other words, you didn't discover this through a normal user workflow?

I actually got an email alert with this error, so I don't know how the user got to there.

I actually got an email alert with this error, so I don't know how the user got to there.

I'm going to assume that was us, because there's no user-facing way to give an invalid engine without manipulating the URL.

I went in circles trying to get messaging to show about an invalid engine, but the way the system works now makes this very difficult because the view depends on there being a valid engine, and we can't show flash messages in the context where we're setting the engine. Additionally, if we throw an OcrException, it halts execution and other parameters such as the image URLs don't get set (which also depend on an engine being set).

Basically, while we definitely can show a message to the user, it requires a lot of hacky code that I felt just wasn't worth it. So for now, we silently fall back to the default engine (which I have set as google).

Pull request: https://github.com/wikimedia/wikimedia-ocr/pull/30

Merged.

ldelench_wmf edited projects, added Community-Tech (CommTech-Sprint-3); removed Community-Tech (CommTech-Sprint-2).Jun 21 2021, 1:50 PM

ldelench_wmf moved this task from Ready 🎬 to QA 🐛 on the Community-Tech (CommTech-Sprint-3) board.

@MusikAnimal Weird bug I cannot explain:

The second request shows the error: The requested engine 'foo' was not found. Using the default engine 'google' instead. (but the OCR is successful).

In fact, if you submit the first link multiple times you get multiple instances of the error.

It also happens if you first go to https://ocr-test.wmcloud.org/api/available_langs?engine=foo.

It only happens if you use the same browser for both requests, so I guess something is cached on the browser-side.

In T282135#7174690, @dom_walden wrote:

@MusikAnimal Weird bug I cannot explain:

Go to https://ocr-test.wmcloud.org/api.php?image=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2F2%2F20%2FENC_9-0379.jpg&engine=foo

Then go to https://ocr-test.wmcloud.org/?image=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2F2%2F20%2FENC_9-0379.jpg&engine=google

The second request shows the error: The requested engine 'foo' was not found. Using the default engine 'google' instead. (but the OCR is successful).

Nice find! That's the intended behaviour of flash messages in Symfony. Flash messages are stored as part of your session, and aren't removed until they are shown. So the solution here is to surface them in the API response (something we should be doing anyway). I'll submit a new PR for this.

PR for exposing flash messages in API responses: https://github.com/wikimedia/wikimedia-ocr/pull/51

MusikAnimal moved this task from In Development 💻 to Review/Feedback 💬 on the Community-Tech (CommTech-Sprint-3) board.Jun 25 2021, 7:52 PM

PR #51 merged.

We now show a nice error message (The requested engine 'foo' was not found. Using the default engine 'google' instead.) when passing an invalid engine to:

Test environment: https://ocr-test.wmcloud.org Version 0.6.0-5-g81a7edc.

Great work everyone!

ldelench_wmf moved this task from Product sign-off 🤘 to Done 🏁 on the Community-Tech (CommTech-Sprint-3) board.Jul 2 2021, 4:46 PM

	F34525086: multiple_errors.png
	Jun 24 2021, 9:51 AM

Wikimedia OCR: Validate the OCR engineClosed, ResolvedPublic3 Estimated Story PointsBUG REPORTActions