Page MenuHomePhabricator

testQueryApiGetter timeout failure
Closed, DeclinedPublic

Description

Maybe we should catch timeouts on wqd and not report them as an error? That might hide real problems...?

https://travis-ci.org/wikimedia/pywikibot-core/jobs/51129125

======================================================================

ERROR: testQueryApiGetter (tests.wikidataquery_tests.TestApiSlowFunctions)
Test that we can actually retreive data and that caching works.
----------------------------------------------------------------------

Traceback (most recent call last):
File "/home/travis/build/wikimedia/pywikibot-core/tests/wikidataquery_tests.py", line 251, in testQueryApiGetter
  data = w.query(q)
File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/data/wikidataquery.py", line 590, in query
  data = self.getDataFromHost(fullQueryString)
File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/data/wikidataquery.py", line 561, in getDataFromHost
  resp = http.request(None, url)
File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/tools.py", line 869, in wrapper
  return obj(*__args, **__kw)
File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/comms/http.py", line 249, in request
  r = fetch(uri, *args, **kwargs)
File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/comms/http.py", line 362, in fetch
  error_handling_callback(request)
File "/home/travis/build/wikimedia/pywikibot-core/pywikibot/comms/http.py", line 282, in error_handling_callback
  raise Server504Error("Server %s timed out" % request.hostname)
Server504Error: Server wdq.wmflabs.org timed out

Event Timeline

jayvdb raised the priority of this task from to Needs Triage.
jayvdb updated the task description. (Show Details)
jayvdb added a project: Pywikibot-tests.
jayvdb added a subscriber: jayvdb.
Restricted Application added subscribers: Aklapper, Unknown Object (MLST). · View Herald TranscriptFeb 17 2015, 10:45 PM
jayvdb changed the task status from Open to Stalled.Jun 9 2015, 5:31 AM

I saw a batch of these today on a development branch, and one of the included a real error message thanks to T93305:

ERROR: testQueryApiGetter (tests.wikidataquery_tests.TestApiSlowFunctions)
Test that we can actually retreive data and that caching works.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/jayvdb/pywikibot-core/tests/wikidataquery_tests.py", line 252, in testQueryApiGetter
    data = w.query(q)
  File "/home/travis/build/jayvdb/pywikibot-core/pywikibot/data/wikidataquery.py", line 601, in query
    data = self.getDataFromHost(fullQueryString)
  File "/home/travis/build/jayvdb/pywikibot-core/pywikibot/data/wikidataquery.py", line 581, in getDataFromHost
    % (url, data))
ServerError: Data received for https://wdq.wmflabs.org/api?q=claim%5B105%5D%20AND%20noclaim%5B225%5D%20AND%20claim%5B100%5D but no JSON could be decoded: u'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body bgcolor="white">\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx/1.4.6 (Ubuntu)</center>\r\n</body>\r\n</html>\r\n'

So, why isnt http.fetch(url) raising ServerError much earlier ? http.error_handling_callback should probably raise ServerError for any 500 series code, and 502 could be treated similarly to Server504Error.

Maybe the server is not actually returning 502 in the status code? Or if we are unlucky this is actually related to Requests not properly propagating the status code.

Okay looking more closely at error_handling_callback it won't fail when the status code is not 200 or 207 but just emit a warning. That means our wikidataquery module is getting the response which is actually an error message but tries to parse it as JSON. Now we could check what the server is returning. If there is a warning which should look like “Http response status 502” then it returns the valid code and we need to add a logic which doesn't decode it when the status code is not 200. Otherwise error_handling_callback is not called or the server is not returning the correct status code.

The original error is also different in that it specifically raises an exception when the code is 504 while the second error is a 502 which has no such exception in error_handling_callback.

Okay it now happened on Travis and the site properly returns a 502 status code as seen in line 1438:

testQueryApiGetter (tests.wikidataquery_tests.TestApiSlowFunctions)

Test that we can actually retreive data and that caching works. ... WARNING: Http response status 502

WARNING: Data received for https://wdq.wmflabs.org/api?q=claim%5B105%5D%20AND%20noclaim%5B225%5D%20AND%20claim%5B100%5D but no JSON could be decoded: '<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body bgcolor="white">\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx/1.4.6 (Ubuntu)</center>\r\n</body>\r\n</html>\r\n'

ERROR

So this is just an issue with our wikidataquery implementation that it is not checking whether the code is actually 200 but is parsing everything (you could argue that the server should return a json string in an error).

The failure was probably related to T102925.

Almost every Travis build now includes a failure of this service.
We should attempt to verify that how long this server operation usually takes, and increase the timeouts during this test accordingly.

Multichill added a subscriber: Multichill.

Functionality was removed in T162585 . Declining this one.