Maniphest T212076

proofreadpage_tests.TestPageOCR.test_ocr_googleocr sometimes fails with ValueError
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Xqt
	Dec 16 2018, 10:19 AM

Description

======================================================================
 4801ERROR: test_ocr_googleocr (tests.proofreadpage_tests.TestPageOCR)
 4802Test page.ocr(ocr_tool='googleOCR').
 4803----------------------------------------------------------------------
 4804Traceback (most recent call last):
 4805  File "c:\projects\pywikibot-g4xqx\tests\proofreadpage_tests.py", line 393, in test_ocr_googleocr
 4806    text = self.page.ocr(ocr_tool='googleOCR')
 4807  File "c:\projects\pywikibot-g4xqx\pywikibot\proofreadpage.py", line 725, in ocr
 4808    raise ValueError('%s: not possible to perform OCR.' % self)
 4809ValueError: [[wikisource:en:Page:Popular Science Monthly Volume 1.djvu/10]]: not possible to perform OCR.
 4810
 4811======================================================================
 4812FAIL: test_do_ocr_googleocr (tests.proofreadpage_tests.TestPageOCR)
 4813Test page._do_ocr(ocr_tool='googleOCR').
 4814----------------------------------------------------------------------
 4815Traceback (most recent call last):
 4816  File "c:\projects\pywikibot-g4xqx\tests\proofreadpage_tests.py", line 388, in test_do_ocr_googleocr
 4817    self.assertEqual(error, ref_error)
 4818AssertionError: True != False
 4819

Details

	Subject	Repo	Branch	Lines +/-
	proofreadpage.py: handle http response code in OCR methods	pywikibot/core	master	+16 -24
	proofreadpage_tests.py: add error text to Exception	pywikibot/core	master	+2 -1

Customize query in gerrit

Event Timeline

Xqt created this task.Dec 16 2018, 10:19 AM

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptDec 16 2018, 10:19 AM

Xqt triaged this task as High priority.Dec 16 2018, 10:19 AM

Framawiki subscribed.Dec 16 2018, 1:54 PM

I think it is has been a temporary unavailability of googleOCR service.

In T212076#4826607, @Mpaa wrote:

I think it is has been a temporary unavailability of googleOCR service.

Can we check this and give an appropriate message?

In T212076#4826944, @Xqt wrote:

In T212076#4826607, @Mpaa wrote:

I think it is has been a temporary unavailability of googleOCR service.

Can we check this and give an appropriate message?

I agree. BTW this would be great for every external tool we use (e.g. from tools.wmflabs.org, there was a problem recently with some tool from that source).

In this sample googleocr was available but the result is different

I copy it here for convenience.
Interesting, it looks like googleOCR answer is not deterministic or some bytes are lost somewhere.

FAIL: test_ocr_googleocr (tests.proofreadpage_tests.TestPageOCR)
Test page.ocr(ocr_tool='googleOCR').
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/wikimedia/pywikibot/tests/proofreadpage_tests.py", line 395, in test_ocr_googleocr
    self.assertEqual(text, ref_text)
AssertionError: u'ENTERED, according to Act of Congress, in the year 1572,\nB D. APPLETON &CO\nI [truncated]... != u'ENTERED, according to Act of Congress, in the year 1572,\nBY D. APPLETON & CO. [truncated]...
  ENTERED, according to Act of Congress, in the year 1572,
- B D. APPLETON &CO
+ BY D. APPLETON & CO.
?  +              +  +
  In the Office of the Librarian of Congress, at Washington.
  4 334

Change 480810 had a related patch set uploaded (by Mpaa; owner: Mpaa):
[pywikibot/core@master] proofreadpage_tests.py: add error text to Exception

https://gerrit.wikimedia.org/r/480810

gerritbot added a project: Patch-For-Review.Dec 19 2018, 6:40 PM

Change 480810 merged by jenkins-bot:
[pywikibot/core@master] proofreadpage_tests.py: add error text to Exception

https://gerrit.wikimedia.org/r/480810

xSavitar moved this task from Backlog to Needs Review on the Pywikibot board.Dec 22 2018, 6:16 PM

Xqt removed a project: Patch-For-Review.Feb 3 2019, 11:45 AM

Xqt moved this task from Needs Review to Backlog on the Pywikibot board.

Xqt lowered the priority of this task from High to Medium.Feb 7 2019, 9:39 AM

Now we have a json.decoder.JSONDecodeError:

https://ci.appveyor.com/project/ladsgroup/pywikibot-g4xqx/build/job/dhuv540dipdiw9si

Change 491008 had a related patch set uploaded (by Mpaa; owner: Mpaa):
[pywikibot/core@master] proofreadpage.py: handle http response code in OCR methods

https://gerrit.wikimedia.org/r/491008

gerritbot added a project: Patch-For-Review.Feb 16 2019, 6:57 PM

In T212076#4959217, @Xqt wrote:

Now we have a json.decoder.JSONDecodeError:

https://ci.appveyor.com/project/ladsgroup/pywikibot-g4xqx/build/job/dhuv540dipdiw9si

This time was:
[00:06:55] Test page._do_ocr(ocr_tool='googleOCR'). ... WARNING: Http response status 404

Change 491008 merged by jenkins-bot:
[pywikibot/core@master] proofreadpage.py: handle http response code in OCR methods

https://gerrit.wikimedia.org/r/491008

I close it as resolved because the failure does not occur anymore. Can be re-opened if we have it again.

proofreadpage_tests.TestPageOCR.test_ocr_googleocr sometimes fails with ValueErrorClosed, ResolvedPublicActions

Description

Details

Event Timeline

proofreadpage_tests.TestPageOCR.test_ocr_googleocr sometimes fails with ValueError
Closed, ResolvedPublic
Actions