Page MenuHomePhabricator

Some OCR-tests are failing at Appveyor
Closed, ResolvedPublic

Description

https://ci.appveyor.com/project/ladsgroup/pywikibot-g4xqx/build/job/034uyfvu5ftdng04

======================================================================
FAIL: test_do_ocr_googleocr (tests.proofreadpage_tests.TestPageOCR)
Test page._do_ocr(ocr_tool='googleOCR').
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\projects\pywikibot-g4xqx\tests\proofreadpage_tests.py", line 384, in test_do_ocr_googleocr
    self.assertEqual(text, ref_text)
AssertionError: u'ENTERED, according to Act of Congress, in the year 1872,\nBr D. APPLETON & CO. [truncated]... != u'ENTERED, according to Act of Congress, in the year 1572,\nBY D. APPLETON & CO. [truncated]...
- ENTERED, according to Act of Congress, in the year 1872,
?                                                     ^
+ ENTERED, according to Act of Congress, in the year 1572,
?                                                     ^
- Br D. APPLETON & CO.,
?  ^                  -
+ BY D. APPLETON & CO.
?  ^
  In the Office of the Librarian of Congress, at Washington.
- A 354
+ 4 334
======================================================================
FAIL: test_do_ocr_phetools (tests.proofreadpage_tests.TestPageOCR)
Test page._do_ocr(ocr_tool='phetools').
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\projects\pywikibot-g4xqx\tests\proofreadpage_tests.py", line 375, in test_do_ocr_phetools
    self.assertEqual(text, ref_text)
AssertionError: u'EsTEnen, according to Act of Congress, in the year 1872,\nBy D. APPLETON & CO. [truncated]... != u'lam-mam, according to Act of Congress, in the year 157-2,\nBY D. APPLEION Av C [truncated]...
- EsTEnen, according to Act of Congress, in the year 1872,
? ^^^^^^^                                             ^
+ lam-mam, according to Act of Congress, in the year 157-2,
? ^^^^^^^                                             ^ +
- By D. APPLETON & CO.,
?  ^         ^   ^
+ BY D. APPLEION Av CO.,
?  ^         ^   ^^
- In the Office of the Librarian of Congress, at Washington.
?          ^^    ^
+ In the Of\ufb01ce or the Librarian of Congress, at Washington.
?          ^    ^
- + 
======================================================================
FAIL: test_ocr_googleocr (tests.proofreadpage_tests.TestPageOCR)
Test page.ocr(ocr_tool='googleOCR').
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\projects\pywikibot-g4xqx\tests\proofreadpage_tests.py", line 394, in test_ocr_googleocr
    self.assertEqual(text, ref_text)
AssertionError: u'ENTERED, according to Act of Congress, in the year 1872,\nBr D. APPLETON & CO. [truncated]... != u'ENTERED, according to Act of Congress, in the year 1572,\nBY D. APPLETON & CO. [truncated]...
- ENTERED, according to Act of Congress, in the year 1872,
?                                                     ^
+ ENTERED, according to Act of Congress, in the year 1572,
?                                                     ^
- Br D. APPLETON & CO.,
?  ^                  -
+ BY D. APPLETON & CO.
?  ^
  In the Office of the Librarian of Congress, at Washington.
- A 354
+ 4 334

Details

Related Gerrit Patches:

Related Objects

Event Timeline

Xqt created this task.Jun 12 2019, 8:36 AM
Restricted Application added a project: Pywikibot. · View Herald TranscriptJun 12 2019, 8:36 AM
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald Transcript
Xqt triaged this task as High priority.Jun 12 2019, 8:37 AM
Dvorapa renamed this task from Some ORC-tests are failing at Appveyor to Some OCR-tests are failing at Appveyor.Jun 12 2019, 1:28 PM
Dvorapa added a subscriber: Dvorapa.

Also on Travis sometimes

Mpaa added a subscriber: Mpaa.EditedJun 18 2019, 7:48 PM

It looks like googleOCR answer is not deterministic.
An option could be to check that at least x% of chars are equal instead of full equality.
The purpose is to check that the query to googleOCR is successful, not to test google algorithm.

It looks like googleOCR answer is not deterministic.
An option could be to check that at least x% of chars are equal instead of full equality.
The purpose is to check that the query to googleOCR is successful, not to test google algorithm.

+1

Change 517756 had a related patch set uploaded (by Mpaa; owner: Mpaa):
[pywikibot/core@master] proofreadpage_tests.py: fix failing OCR-tests

https://gerrit.wikimedia.org/r/517756

Change 517756 merged by jenkins-bot:
[pywikibot/core@master] proofreadpage_tests.py: fix failing OCR-tests

https://gerrit.wikimedia.org/r/517756

Xqt closed this task as Resolved.Jun 20 2019, 7:17 AM
Xqt claimed this task.