Page MenuHomePhabricator

Pywikibot takes too long to give up on contacting the target of an interwikilink
Closed, ResolvedPublicBUG REPORT

Description

Steps to Reproduce:

$ python
Python 3.5.3 (default, Dec 12 2020, 14:55:10)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pywikibot
>>> site = pywikibot.Site()
>>> text = 'iarchive:documentsofrussi027937mbp'
>>> link = pywikibot.Link(text, site, 0)
>>> link.title

(My code doesn't call link.title directly, but this is the shorter way to reproduce.)

Actual Results:

WARNING: Http response status 404
WARNING: Non-JSON response received from server iarchive:iarchive; the server may be down.
WARNING: Waiting 5.0 seconds before retrying.
WARNING: Http response status 404
WARNING: Non-JSON response received from server iarchive:iarchive; the server may be down.
WARNING: Waiting 10.0 seconds before retrying.
[...]
WARNING: Waiting 120.0 seconds before retrying.
WARNING: Http response status 404
WARNING: Non-JSON response received from server iarchive:iarchive; the server may be down.

Total time: ~22 minutes

Expected Results:

Better handle the fact that the target of an interwikilink may not be a wiki or that querying it will receive an acceptable response.

Event Timeline

JJMC89 renamed this task from Pywikibot takes too link to give up on contacting the target of an interwikilink to Pywikibot takes too long to give up on contacting the target of an interwikilink.Jan 25 2021, 9:44 PM
JJMC89 updated the task description. (Show Details)

Looks to be caused by {fb5d6e3}

Xqt triaged this task as Medium priority.EditedJan 26 2021, 5:43 AM

The loop is made in api.Request._json_loads(). Let’s give up returning plain text with comms.request() first to have full access to requests.Response inside _json_loads() and probably a better control over pages giving unexpected results.

I guess we have the same behaviour with site.interwiki('iarchive') or another example for 'bugzilla'. It is to rough assuming the server is down if there is no JSON content.

Some of my bot tasks cannot function correctly with this occurring, so I've cloned version 5.5 instead of using the shared stable on Toolforge.

Seem that interwikiap does not always contains foreign MediaWiki sites but also other content like for phabricator/bugzilla or iarchive in this case. In this case it does not make any sense to retry to get any json result. Is there any easy way to find out whether a interiki table entry has a MediaWiki as target?

Xqt raised the priority of this task from Medium to High.Jan 26 2021, 8:51 AM

Seem that interwikiap does not always contains foreign MediaWiki sites but also other content like for phabricator/bugzilla or iarchive in this case. In this case it does not make any sense to retry to get any json result. Is there any easy way to find out whether a interiki table entry has a MediaWiki as target?

Not that I know of. I was getting the same issue for MediaWiki sites like orthodoxwiki and wikiasite.

Setting -max_retries:1 as global option or setting it in user-config.py will force giving up faster; but it does not solve the underlying problem. I guess Status code is reported as 200 now after changing to requests.Response result of comms.http.request()?

Setting -max_retries:1 as global option or setting it in user-config.py will force giving up faster; but it does not solve the underlying problem. I guess Status code is reported as 200 now after changing to requests.Response result of comms.http.request()?

Ah, no it's 404:

>>> import pywikibot as py
>>> s = py.Site()
>>> py.config.max_retries = 1
>>> ia = s.interwiki('iarchive')
>>> ia
APISite("iarchive", "iarchive")
>>> ia.siteinfo['namespace']
WARNING: Http response status 404
WARNING: Non-JSON response received from server iarchive:iarchive; the server may be down.
Status code:404
WARNING: Waiting 5.0 seconds before retrying.
WARNING: Http response status 404
WARNING: Non-JSON response received from server iarchive:iarchive; the server may be down.
Status code:404
Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    ia.siteinfo['namespace']
  File "C:\pwb\GIT\core\pywikibot\site\_siteinfo.py", line 253, in __getitem__
    return self.get(key, False)  # caches and doesn't force it
  File "C:\pwb\GIT\core\pywikibot\site\_siteinfo.py", line 301, in get
    preloaded = self._get_general(key, expiry)
  File "C:\pwb\GIT\core\pywikibot\site\_siteinfo.py", line 241, in _get_general
    default_info = self._get_siteinfo(props, expiry)
  File "C:\pwb\GIT\core\pywikibot\site\_siteinfo.py", line 165, in _get_siteinfo
    data = request.submit()
  File "C:\pwb\GIT\core\pywikibot\data\api.py", line 2053, in submit
    self._data = super().submit()
  File "C:\pwb\GIT\core\pywikibot\data\api.py", line 1803, in submit
    result = self._json_loads(response)
  File "C:\pwb\GIT\core\pywikibot\data\api.py", line 1600, in _json_loads
    self.wait()
  File "C:\pwb\GIT\core\pywikibot\data\api.py", line 1920, in wait
    raise TimeoutError('Maximum retries attempted without success.')
pywikibot.exceptions.TimeoutError: Maximum retries attempted without success.

>>>

Change 663784 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [IMPR] Print additional informations if Non-JSON response received from server

https://gerrit.wikimedia.org/r/663784

@JAnD: I checked your command line from T276660 and it worked for me as expected. Are you able to review my last commit and confirm the right behaviour?

Change 663784 merged by jenkins-bot:
[pywikibot/core@master] [IMPR] Raise an exception response is Non-JSON and site is AutoFamily

https://gerrit.wikimedia.org/r/663784

JJMC89 assigned this task to Xqt.

@JAnD: I checked your command line from T276660 and it worked for me as expected. Are you able to review my last commit and confirm the right behaviour?

It works now.:-)