Page MenuHomePhabricator

Wikimedia OCR: Validate the OCR engine
Closed, ResolvedPublic3 Estimated Story PointsBUG REPORT

Description

What is the problem?

If you enter an invalid OCR engine name, it returns a 500 error.

We should probably return a nice validation message, saying which engines we do support.

URLs to reproduce problem
Environment

Wikimedia OCR: Version 0.2.0

Event Timeline

MusikAnimal changed the point value for this task from 2 to 3.May 7 2021, 6:55 PM

I guess T284728 can be merged into this task, right?

I guess T284728 can be merged into this task, right?

Yes! You found that by manually entering in an invalid engine, right? In other words, you didn't discover this through a normal user workflow?

I guess T284728 can be merged into this task, right?

Yes! You found that by manually entering in an invalid engine, right? In other words, you didn't discover this through a normal user workflow?

I actually got an email alert with this error, so I don't know how the user got to there.

I actually got an email alert with this error, so I don't know how the user got to there.

I'm going to assume that was us, because there's no user-facing way to give an invalid engine without manipulating the URL.

I went in circles trying to get messaging to show about an invalid engine, but the way the system works now makes this very difficult because the view depends on there being a valid engine, and we can't show flash messages in the context where we're setting the engine. Additionally, if we throw an OcrException, it halts execution and other parameters such as the image URLs don't get set (which also depend on an engine being set).

Basically, while we definitely can show a message to the user, it requires a lot of hacky code that I felt just wasn't worth it. So for now, we silently fall back to the default engine (which I have set as google).

Pull request: https://github.com/wikimedia/wikimedia-ocr/pull/30

@MusikAnimal Weird bug I cannot explain:

  1. Go to https://ocr-test.wmcloud.org/api.php?image=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2F2%2F20%2FENC_9-0379.jpg&engine=foo
  2. Then go to https://ocr-test.wmcloud.org/?image=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2F2%2F20%2FENC_9-0379.jpg&engine=google

The second request shows the error: The requested engine 'foo' was not found. Using the default engine 'google' instead. (but the OCR is successful).

In fact, if you submit the first link multiple times you get multiple instances of the error.

multiple_errors.png (331Γ—1 px, 27 KB)

It also happens if you first go to https://ocr-test.wmcloud.org/api/available_langs?engine=foo.

It only happens if you use the same browser for both requests, so I guess something is cached on the browser-side.

@MusikAnimal Weird bug I cannot explain:

  1. Go to https://ocr-test.wmcloud.org/api.php?image=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2F2%2F20%2FENC_9-0379.jpg&engine=foo
  2. Then go to https://ocr-test.wmcloud.org/?image=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2F2%2F20%2FENC_9-0379.jpg&engine=google

The second request shows the error: The requested engine 'foo' was not found. Using the default engine 'google' instead. (but the OCR is successful).

Nice find! That's the intended behaviour of flash messages in Symfony. Flash messages are stored as part of your session, and aren't removed until they are shown. So the solution here is to surface them in the API response (something we should be doing anyway). I'll submit a new PR for this.