Page MenuHomePhabricator

site_tests test_image_usage_in_redirect timeout error after maximum retries
Closed, ResolvedPublic

Description

The recent travis builds are regularly failing on site_tests.testImageusage with a timeout error only when run against en.wp. Travis runs this test for each checking against 2.6, 2.7.8 and 3.3.5 at the moment. It doesnt happen every time, and it hasnt happend on 2.6 yet for some reason.

Note this bug is _not_ bug 60214, which is about content & caching assumptions in the test case, which very affect mostly re-runs of the test suite on a warm cache, and only very rarely occur on a cold cache.

Python 2.7.8 log (https://travis-ci.org/wikimedia/pywikibot-core/jobs/37633107)

ERROR: testImageusage (tests.site_tests.TestSiteObject)
Test the site.imageusage() method
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/wikimedia/pywikibot-core/tests/site_tests.py", line 616, in testImageusage
    for using in mysite.imageusage(imagepage, filterredir=True, total=5):
  File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/data/api.py", line 971, in __iter__
    self.data = self.request.submit()
  File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/data/api.py", line 764, in submit
    self._data = super(CachedRequest, self).submit()
  File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/data/api.py", line 535, in submit
    self.wait()
  File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/data/api.py", line 648, in wait
    raise TimeoutError("Maximum retries attempted without success.")
TimeoutError: Maximum retries attempted without success.

Python 3.3.5 has a two part log (https://travis-ci.org/wikimedia/pywikibot-core/jobs/37633111)

Traceback (most recent call last):
  File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/data/api.py", line 517, in submit
    headers=headers, body=body)
  File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/tools.py", line 367, in wrapper
    return method(*__args, **__kw)
  File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/comms/http.py", line 258, in request
    raise request.data
  File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/comms/threadedhttp.py", line 219, in request
    max_redirects, connection_type
  File "/home/travis/virtualenv/python3.3.5/lib/python3.3/site-packages/httplib2-0.9_pywikibot1-py3.3.egg/httplib2/__init__.py", line 1292, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/home/travis/virtualenv/python3.3.5/lib/python3.3/site-packages/httplib2-0.9_pywikibot1-py3.3.egg/httplib2/__init__.py", line 1042, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/home/travis/virtualenv/python3.3.5/lib/python3.3/site-packages/httplib2-0.9_pywikibot1-py3.3.egg/httplib2/__init__.py", line 995, in _conn_request
    response = conn.getresponse()
  File "/opt/python/3.3.5/lib/python3.3/http/client.py", line 1147, in getresponse
    response.begin()
  File "/opt/python/3.3.5/lib/python3.3/http/client.py", line 358, in begin
    version, status, reason = self._read_status()
  File "/opt/python/3.3.5/lib/python3.3/http/client.py", line 320, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/opt/python/3.3.5/lib/python3.3/socket.py", line 297, in readinto
    return self._sock.recv_into(b)
  File "/opt/python/3.3.5/lib/python3.3/ssl.py", line 493, in recv_into
    return self.read(nbytes, buffer)
  File "/opt/python/3.3.5/lib/python3.3/ssl.py", line 367, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/travis/build/wikimedia/pywikibot-core/tests/site_tests.py", line 616, in testImageusage
    for using in mysite.imageusage(imagepage, filterredir=True, total=5):
  File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/data/api.py", line 971, in __iter__
    self.data = self.request.submit()
  File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/data/api.py", line 764, in submit
    self._data = super(CachedRequest, self).submit()
  File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/data/api.py", line 535, in submit
    self.wait()
  File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/data/api.py", line 648, in wait
    raise TimeoutError("Maximum retries attempted without success.")
pywikibot.data.api.TimeoutError: Maximum retries attempted without success.

The lowest travis job number i've found with this problem is
https://travis-ci.org/wikimedia/pywikibot-core/jobs/37481646


Version: core-(2.0)
Severity: normal

Details

Reference
bz71971

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:48 AM
bzimport added a project: Pywikibot-tests.
bzimport set Reference to bz71971.
bzimport added a subscriber: Unknown Object (????).

This may have been caused by changing the travis retry number to 2 : https://gerrit.wikimedia.org/r/#/c/160000/ , which was merged on 13 Sept 3:30 AM

We could try increasing max_retries to 5 to see if that fixes the problem.

Another solution is to disable this test ;-) Pending patch:
https://gerrit.wikimedia.org/r/160192 from bug 68671

But it is likely this test is doing something odd/extreme and the solution needs to be found.

Another possible cause is the logic to select a page for this test was changed 11 Sept. https://gerrit.wikimedia.org/r/#/c/159633/

Now that 13 Sept looks like the date this problem stated occurring _regularly_, I've found an earlier instance of the problem: 18 Sept 18 2014 https://gerrit.wikimedia.org/r/#/c/161201/
https://travis-ci.org/wikimedia/pywikibot-core/jobs/35616554

Next one I can find is 25 Sept https://gerrit.wikimedia.org/r/#/c/162863/
https://travis-ci.org/wikimedia/pywikibot-core/jobs/36296502

https://travis-ci.org/wikimedia/pywikibot-core/jobs/36903977
https://travis-ci.org/wikimedia/pywikibot-core/jobs/37111303
https://travis-ci.org/wikimedia/pywikibot-core/jobs/37270052

I was able to reproduce this timeout once on my workstation by running the test several times with en.wp set as my default site on a cold cache, sometimes it only times out once, but occasionally it times out three times.

rm -rf tests/apicache/; python -m unittest tests.site_tests.TestSiteObject.testImageusage

I can find pre-13 Sept instances of the same error in this test, but as the retry limit was very high the API code keeps trying to bypass the problem

https://travis-ci.org/wikimedia/pywikibot-core/jobs/35049020#L375


testImageusage on a cold cache runs 12 queries; on a warm cache it only makes one live request, caching the other 11.

(In the process I found a regression https://gerrit.wikimedia.org/r/#/c/166367/)

When run against en.wp as default site, site_tests.testImageusage uses [[en:File:Wiktionary-logo-en.svg]] which is used on [[en:!!]]

The query which takes the longest is:

https://en.wikipedia.org/w/api.php?inprop=protection&giulimit=5&maxlag=5&generator=imageusage&format=json&giufilterredir=redirects&prop=info%7Cimageinfo%7Ccategoryinfo&meta=userinfo&indexpageids=&action=query&giutitle=File%3AWiktionary-logo-en.svg&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Csha1%7Cmetadata&uiprop=blockinfo%7Chasmsg

Running wget on that URL, repeatedly, doesnt reproduce the problem.

Note that the slow query is finding images on redirects only, which sounds like it should be a slow query. e.g.

https://en.wikipedia.org/w/index.php?title=Ambulance%20chaser&redirect=no

Change 166375 had a related patch set uploaded by John Vandenberg:
Split testImageusage into four tests

https://gerrit.wikimedia.org/r/166375

testImageusage

https://travis-ci.org/wikimedia/pywikibot-core/jobs/37837770
https://travis-ci.org/wikimedia/pywikibot-core/jobs/37871882

Here is it failing _once_ only in py2.6

https://travis-ci.org/wikimedia/pywikibot-core/jobs/37876770

now also seeing the same problem with testUsercontribs

https://travis-ci.org/wikimedia/pywikibot-core/jobs/37871878
https://travis-ci.org/wikimedia/pywikibot-core/jobs/37871882

And possible unrelated testUsers has a new assert

FAIL: testUsers (tests.site_tests.SiteUserTestCase)

Test the site.users() method

Traceback (most recent call last):

File "/home/travis/build/wikimedia/pywikibot-core/tests/site_tests.py", line 1017, in testUsers
  us = list(mysite.users(mysite.user()))
File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/site.py", line 3588, in users
  usprop="blockinfo|groups|editcount|registration|emailable"
File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/data/api.py", line 1199, in __init__
  QueryGenerator.__init__(self, list=listaction, **kwargs)
File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/data/api.py", line 819, in __init__
  self.update_limit()  # sets self.prefix
File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/data/api.py", line 914, in update_limit
  for param in self._modules[mod].get("parameters", []):
File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/data/api.py", line 866, in _modules
  assert "paraminfo" in data

AssertionError

https://travis-ci.org/wikimedia/pywikibot-core/jobs/37871882

Please review r/166375 so that subsequent travis builds help narrow down the exact problem.

Could it be HHVM related?

Change 166375 merged by jenkins-bot:
Split testImageusage into four tests

https://gerrit.wikimedia.org/r/166375

As expected it is test_image_usage_in_redirects which is causing the problem

https://travis-ci.org/wikimedia/pywikibot-core/jobs/37887126
https://travis-ci.org/wikimedia/pywikibot-core/jobs/37887726

I think the next step is:

set max-retries to 0 or 1 for all test runs, except keep travis at 2 to avoid red builds

create TestCase.expectedFailureIf , and add @expectedFailureIf(<travis>) to test_image_usage_in_redirect.

This will mean that developer testing will fail if it occurs on their workstation, but it wont cause red builds on Travis since we already know about this problem and are tracking it down.

Change 166532 had a related patch set uploaded by John Vandenberg:
@expectedFailureIf(TRAVIS=true) on imageusage test

https://gerrit.wikimedia.org/r/166532

This one is very strange. First is the error in test_image_usage_in_redirects , then one of the results in test_image_usage_no_redirect_filter is wrong!

https://travis-ci.org/wikimedia/pywikibot-core/jobs/37948029

Change 166532 merged by XZise:
@expectedFailureIf(TRAVIS=true) on imageusage test

https://gerrit.wikimedia.org/r/166532

I got this locally if it can help, on wikipedia:

rm -rf tests/apicache/; python -m unittest tests.site_tests.TestImageUsage.test_image_usage_in_redirects

site_tests.TestImageUsage found [[en:File:Wiktionary-logo-en.svg]] on [[en:!!]]
ERROR: Traceback (most recent call last):

File "pywikibot/data/api.py", line 567, in submit
  headers=headers, body=body)
File "pywikibot/tools.py", line 529, in wrapper
  return obj(*__args, **__kw)
File "pywikibot/comms/http.py", line 258, in request
  raise request.data

SSLError: The read operation timed out

WARNING: Waiting 5 seconds before retrying.

...

Next run was OK:

site_tests.TestImageUsage found [[en:File:Wiktionary-logo-en.svg]] on [[en:!!]]
[[en:Cock sucker]]
[[en:Ambulance chaser]]
[[en:Cyberpet]]
[[en:Colorfast]]

35.620s .

Ran 1 test in 35.621s

OK

Changes to the tests have been merged. Underlying problem hasnt been fixed.

(In reply to John Mark Vandenberg from comment #8)

This one is very strange. First is the error in
test_image_usage_in_redirects , then one of the results in
test_image_usage_no_redirect_filter is wrong!

https://travis-ci.org/wikimedia/pywikibot-core/jobs/37948029

Another error in test_image_usage_no_redirect_filter

https://travis-ci.org/wikimedia/pywikibot-core/jobs/38570786

(It is probably not the same bug as the test_image_usage_in_redirect bug, and a new bug should be created once we have more information about the cause.)

(In reply to John Mark Vandenberg from comment #13)

Changes to the tests have been merged. Underlying problem hasnt been fixed.

I do not think there is an underlying problem in test_image_usage_in_redirects.
The API is already slow by itself when called with those parameters.

One solution would be to change the logic that we currently use for setting the query limits.

I think this can be avoided just raising the number of retrieved tems from 5 to 500 instead of @expectedFailureIf(TRAVIS=true) on imageusage test.

I've forked the test_image_usage_no_redirect_filter problem to bug 73120.

I just received a DBQueryError on this test.

2014-11-29 01:41:35    threadedhttp.py,  219 in            request: DEBUG    ('https://en.wikipedia.org/w/api.php?inprop=protection&giulimit=5&maxlag=5&generator=imageusage&format=json&giufilterredir=redirects&prop=info|imageinfo|categoryinfo&continue=&meta=userinfo&indexpageids=&action=query&giutitle=File%3AWiktionary-logo-en.svg&iiprop=timestamp|user|comment|url|size|sha1|metadata&uiprop=blockinfo|hasmsg', 'GET', None, {'cookie': 'forceHTTPS=true; enwikiUserName=JVbot; enwikiSession=....; enwikiUserID=5197800; GeoIP=ID:Jakarta:...:v4; centralauth_Token=....; centralauth_User=JVbot; forceHTTPS=1', 'connection': 'keep-alive', 'Content-Type': 'application/x-www-form-urlencoded', 'user-agent': u'python -m unittest (wikipedia:en; User:JVbot) Pywikibot/2.0b2 (g4649) httplib2/0.9 Python/2.7.5.final.0'}, 5, None)
2014-11-29 01:42:35             api.py,  966 in             submit: DEBUG    API response received from wikipedia:en:
{"servedby":"mw1221","error":{"code":"internal_api_error_DBQueryError","info":"Database query error","*":""}}
2014-11-29 01:42:35             api.py, 1043 in             submit: ERROR    Detected MediaWiki API exception DBQueryError; retrying
2014-11-29 01:42:35             api.py, 1049 in             submit: VERBOSE  MediaWiki exception DBQueryError details:
          query=
{'action': [u'query'],
 'continue': [u''],
 'format': ['json'],
 'generator': [u'imageusage'],
 'giufilterredir': [u'redirects'],
 u'giulimit': [u'5'],
 'giutitle': [u'File:Wiktionary-logo-en.svg'],
 'iiprop': [u'timestamp',
            u'user',
            u'comment',
            u'url',
            u'size',
            u'sha1',
            u'metadata'],
 'indexpageids': [u''],
 'inprop': [u'protection'],
 'maxlag': ['5'],
 'meta': ['userinfo'],
 'prop': [u'info', u'imageinfo', u'categoryinfo'],
 'uiprop': ['blockinfo', 'hasmsg']}
          response=
{u'servedby': u'mw1221', u'error': {'help': u''}}
2014-11-29 01:42:35             api.py, 1081 in               wait: WARNING  Waiting 5 seconds before retrying.
2014-11-29 01:42:40    threadedhttp.py,   92 in     pop_connection: DEBUG    Retrieved connection from 'https:en.wikipedia.org' pool.
2014-11-29 01:42:40    threadedhttp.py,  219 in            request: DEBUG    ('https://en.wikipedia.org/w/api.php?inprop=protection&giulimit=5&maxlag=5&generator=imageusage&format=json&giufilterredir=redirects&prop=info|imageinfo|categoryinfo&continue=&meta=userinfo&indexpageids=&action=query&giutitle=File%3AWiktionary-logo-en.svg&iiprop=timestamp|user|comment|url|size|sha1|metadata&uiprop=blockinfo|hasmsg', 'GET', None, {'cookie': 'forceHTTPS=true; enwikiUserName=JVbot; enwikiSession=...; enwikiUserID=5197800; GeoIP=ID:Jakarta:...:v4; centralauth_Token=...; centralauth_User=JVbot; forceHTTPS=1', 'connection': 'keep-alive', 'Content-Type': 'application/x-www-form-urlencoded', 'user-agent': u'python -m unittest (wikipedia:en; User:JVbot) Pywikibot/2.0b2 (g4649) httplib2/0.9 Python/2.7.5.final.0'}, 5, None)
2014-11-29 01:44:40             api.py,  959 in             submit: ERROR    Traceback (most recent call last):
  File "pywikibot/data/api.py", line 937, in submit
    headers=headers, body=body)
  File "pywikibot/tools.py", line 679, in wrapper
    return obj(*__args, **__kw)
  File "pywikibot/comms/http.py", line 261, in request
    raise request.data
SSLError: The read operation timed out
In T73971#755541, @Mpaa wrote:

(In reply to John Mark Vandenberg from comment #13)

Changes to the tests have been merged. Underlying problem hasnt been fixed.

I do not think there is an underlying problem in test_image_usage_in_redirects.
The API is already slow by itself when called with those parameters.

Yes, I had hoped that mediawiki devs might help diagnose this problem.

One solution would be to change the logic that we currently use for setting the query limits.
I think this can be avoided just raising the number of retrieved tems from 5 to 500 instead of @expectedFailureIf(TRAVIS=true) on imageusage test.

This sounds like a good approach. For queries we know are problematic (from experience), QueryGenerator can adjust the limits used in the API query (maybe try unlimited) , and QueryGenerator enforces the limit internally.

jayvdb set Security to None.
jayvdb removed a subscriber: Unknown Object (????).
Xqt subscribed.

Haven’t seen this issue for a long time