Page MenuHomePhabricator

proofreadpage_tests.TestPageOCR fails in some test builds
Closed, ResolvedPublic

Description

For certain builds proofreadpage_tests.TestPageOCR fails.

See e.g. https://travis-ci.org/wikimedia/pywikibot/builds/459230661

On my PC (py37) they pass.

Details

Related Gerrit Patches:

Event Timeline

Mpaa created this task.Nov 24 2018, 9:58 PM
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptNov 24 2018, 9:58 PM
Mpaa added a comment.Nov 24 2018, 10:36 PM

It fails on:

"env": "LANGUAGE=en FAMILY=wikipedia PYWIKIBOT_TEST_PROD_ONLY=1",

and pass on:

"env": "LANGUAGE=zh FAMILY=wikisource PYSETUP_TEST_EXTRAS=1 PYWIKIBOT_TEST_PROD_ONLY=1 PYWIKIBOT_TEST_NO_RC=1",

so I guess it is related to family.

Xqt triaged this task as High priority.Nov 25 2018, 4:28 AM
Xqt added a subscriber: Xqt.Nov 25 2018, 9:59 AM

I can reproduce this failure on my repo:

======================================================================
ERROR: test_do_hocr (__main__.TestPageOCR)
Test page._do_hocr().
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".\tests\proofreadpage_tests.py", line 361, in test_do_hocr
    error, text = self.page._do_hocr()
  File "C:\pwb\GIT\core\pywikibot\proofreadpage.py", line 649, in _do_hocr
    ocr_tool=self._PHETOOLS)
  File "C:\pwb\GIT\core\pywikibot\proofreadpage.py", line 618, in _ocr_callback
    return (error, parser_func(_text))
  File "C:\pwb\GIT\core\pywikibot\proofreadpage.py", line 629, in parse_hocr_tex
t
    soup = Soup(txt)
NameError: global name 'Soup' is not defined

======================================================================
ERROR: test_do_ocr_googleocr (__main__.TestPageOCR)
Test page._do_ocr(ocr_tool='googleOCR').
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".\tests\proofreadpage_tests.py", line 385, in test_do_ocr_googleocr
    error, text = self.page._do_ocr(ocr_tool='googleOCR')
  File "C:\pwb\GIT\core\pywikibot\proofreadpage.py", line 654, in _do_ocr
    url_image = self.url_image
  File "C:\pwb\GIT\core\pywikibot\proofreadpage.py", line 544, in url_image
    soup = Soup(response.text)
NameError: global name 'Soup' is not defined

======================================================================
ERROR: test_do_ocr_phetools (__main__.TestPageOCR)
Test page._do_ocr(ocr_tool='phetools').
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".\tests\proofreadpage_tests.py", line 378, in test_do_ocr_phetools
    error, text = self.page._do_ocr(ocr_tool='phetools')
  File "C:\pwb\GIT\core\pywikibot\proofreadpage.py", line 654, in _do_ocr
    url_image = self.url_image
  File "C:\pwb\GIT\core\pywikibot\proofreadpage.py", line 544, in url_image
    soup = Soup(response.text)
NameError: global name 'Soup' is not defined

======================================================================
ERROR: test_ocr_googleocr (__main__.TestPageOCR)
Test page.ocr(ocr_tool='googleOCR').
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".\tests\proofreadpage_tests.py", line 392, in test_ocr_googleocr
    text = self.page.ocr(ocr_tool='googleOCR')
  File "C:\pwb\GIT\core\pywikibot\proofreadpage.py", line 709, in ocr
    error, text = self._do_ocr(ocr_tool=ocr_tool)
  File "C:\pwb\GIT\core\pywikibot\proofreadpage.py", line 654, in _do_ocr
    url_image = self.url_image
  File "C:\pwb\GIT\core\pywikibot\proofreadpage.py", line 544, in url_image
    soup = Soup(response.text)
NameError: global name 'Soup' is not defined

----------------------------------------------------------------------
Ran 55 tests in 13.698s

FAILED (errors=4, skipped=28)
Xqt claimed this task.Nov 25 2018, 10:06 AM

Change 475612 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [bugfix] Require bs4 library for TestPageOCR

https://gerrit.wikimedia.org/r/475612

Change 475613 had a related patch set uploaded (by Mpaa; owner: Mpaa):
[pywikibot/core@master] proofreadpage.py: OCR needs BeautifulSoup

https://gerrit.wikimedia.org/r/475613

Mpaa added a comment.Nov 25 2018, 10:20 AM

OK, we worked at the same time.
Added more check in proofreadpage.py
PYSETUP_TEST_EXTRAS=1 installs bs4

Change 475612 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] Require bs4 library for TestPageOCR

https://gerrit.wikimedia.org/r/475612

Xqt closed this task as Resolved.Nov 25 2018, 2:08 PM

Change 475613 merged by jenkins-bot:
[pywikibot/core@master] proofreadpage.py: OCR needs BeautifulSoup

https://gerrit.wikimedia.org/r/475613