Page MenuHomePhabricator

exceptions in threadedhttp.Http._follow_redirect cause a lock up
Closed, ResolvedPublic

Description

>>> http.fetch(uri='http://getstatuscode.com/301')
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
    self.run()
  File "pywikibot/comms/threadedhttp.py", line 486, in run
    item.data = self.http.request(*item.args, **item.kwargs)
  File "pywikibot/comms/threadedhttp.py", line 254, in request
    uri, method, body, headers, response, content, max_redirects)
  File "pywikibot/comms/threadedhttp.py", line 273, in _follow_redirect
    response, content)
RedirectMissingLocation: Redirected but the response is missing a Location: header.

And the main thread doesnt return.

This exception is thrown by threadedhttp, whereas threadedhttp normally returns the exception to be passed back to the main thread.

Event Timeline

jayvdb claimed this task.
jayvdb raised the priority of this task from to Needs Triage.
jayvdb updated the task description. (Show Details)
jayvdb added a project: Pywikibot.
jayvdb changed Security from none to None.
jayvdb added subscribers: jayvdb, valhallasw.

Presumably the same happens with 'raise httplib2.RedirectLimit' in the same method.

jayvdb renamed this task from HTTP redirect without Location locks up the http layer to exceptions in threadedhttp.Http._follow_redirect cause a lock up.Dec 10 2014, 6:18 PM
jayvdb triaged this task as High priority.

I think that instead of the current

try:
    (response, content) = httplib2.Http.request(
        self, uri, method, body, headers,
        max_redirects, connection_type
    )
except Exception as e:  # what types?
    # return exception instance to be retrieved by the calling thread
    return e

wrapper, we should actually just wrap all of def request with a try-catch block which returns e on error. Does that sound like the right solution to you?

I was thinking it should be in HttpProcessor.run , which has a try..finally , .. but not catch. depending on how bad the exception is (and it is best to assume the worst, like some state has been irreparably altered: https://gerrit.wikimedia.org/r/#/c/178789/ ), it would be sensible to replace the dead http worker with a new one.

Yes, that sounds reasonable. This does have some implications for when we can and cannot re-use a connection, though -- we should prevent spinning up a new connection for typical issues (e.g. 503s).

ya, we have a try.. catch inside .request , and it returns the exception to be processed by the client. I think that should not be changed, which is where 503's normally happen. It would only be unexpected exceptions which would be handled by HttpProcessor.run and result in a new worker being created.

I am also interested in why we need the special follow code. Im pretty sure it is being used because of %20 -> _ on MediaWiki sites, but that should only be needed for full urls to wiki pages, which is very rare occurrence in pywikibot core these days.

The original bug was resolved in 35936a7abfdd9531073215994f5376d3ed2004df and another problem was fix in d398fdc56e2601659088a60aa199224cb247f9cd (with more general problem detection added), but the actual broader problem probably wasnt fixed.
However httplib2 has been replaced by requests, and boarder solutions are not needed for the 2.0 branch, and the possible re-introduction of httplib2 support would be a cut down implementation without follow support as it isnt needed according to my inquiries (see last comment on this bug) and the test suit confirms this with the minimal implementation on the table (https://gerrit.wikimedia.org/r/#/c/215603/ ).
So nothing left to do here.