Page MenuHomePhabricator

Wikimedia OCR: "Call to a member function getText() on null" when image has no text
Closed, ResolvedPublic1 Estimated Story PointsBUG REPORT


What is the problem?

On a number of images I am seeing the exception:

Call to a member function getText() on null

(pointing to src/Engine/GoogleCloudVisionEngine.php line 49)

I think it is because the images do not have text.

Example images to reproduce problem

Wikimedia OCR: Local docker commit 83bf34ce24d30464fad5cd2f88b5173f2d0a23a5

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptMon, Apr 26, 1:37 PM

Thanks for reporting this issue, @dom_walden! I also experience errors when I try to OCR those image files (see screenshot example below):

Ideally, if someone provides an image with no text, they should see a helpful error message rather than a 500 error page. However, I don't know how common it is for people to try to OCR images that have no text (perhaps a severe edge case)? If so, perhaps it's not something that is high priority for us to fix now. Anyway, I'll leave this up to @NRodriguez to prioritize, but just wanted to share some passing thoughts!

Getting an empty string back for an image that contains no recognizable text is not an error, that's just returning the correct output. There are any number of reasons people might ask for OCR of an image that would return no text: there is text but the OCR engine fails to recognize it; they are doing a page image in a sequence and hitting the OCR button by habit (or their gadget does more than just load the OCR); they have a gadget that automatically requests the OCR on page load; etc. And in a bulk OCR scenario it will be entirely normal for the sequence of images being processed to contain anything from a few to several tens of blank pages.

I cannot reproduce the bug in the description anymore, via the web page or API. Now, we just return an empty text box.

I also checked that OCR still works for images with text.

Test environment Version 0.1.0-5-gf2af8be