Page MenuHomePhabricator

pywikibot does not handle properly 502 Server Error reading pages
Closed, ResolvedPublicBUG REPORT

Description

while running pywikibot on pl.wikipedia I quite often encounter code crashes after page generator encounters 502 Server Error from wikimedia API

What happens?:
script throws exception and crashes

Retrieving 50 pages from wikipedia:pl.
ERROR: Traceback (most recent call last):
  File "/home/masti/pw/core/pywikibot/data/api/_requests.py", line 684, in _http_request
    response = http.request(self.site, uri=uri,
  File "/home/masti/pw/core/pywikibot/comms/http.py", line 283, in request
    r = fetch(baseuri, headers=headers, **kwargs)
  File "/home/masti/pw/core/pywikibot/comms/http.py", line 457, in fetch
    callback(response)
  File "/home/masti/pw/core/pywikibot/comms/http.py", line 354, in error_handling_callback
    raise ServerError(
pywikibot.exceptions.ServerError: 502 Server Error: Server Hangup

Traceback (most recent call last):
  File "/home/masti/pw/core/pwb.py", line 40, in <module>
    sys.exit(main())
  File "/home/masti/pw/core/pwb.py", line 36, in main
    runpy.run_path(str(path), run_name='__main__')
  File "/usr/lib/python3.10/runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/usr/lib/python3.10/runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/masti/pw/core/pywikibot/scripts/wrapper.py", line 521, in <module>
    main()
  File "/home/masti/pw/core/pywikibot/scripts/wrapper.py", line 505, in main
    if not execute():
  File "/home/masti/pw/core/pywikibot/scripts/wrapper.py", line 492, in execute
    run_python_file(filename, script_args, module)
  File "/home/masti/pw/core/pywikibot/scripts/wrapper.py", line 149, in run_python_file
    exec(compile(source, filename, 'exec', dont_inherit=True),
  File "masti/ms-contains.py", line 433, in <module>
    main()
  File "masti/ms-contains.py", line 427, in main
    bot.run()  # guess what it does
  File "masti/ms-contains.py", line 147, in run
    for page in self.generator:
  File "/home/masti/pw/core/pywikibot/pagegenerators/__init__.py", line 660, in PreloadingGenerator
    yield from site.preloadpages(group, groupsize=groupsize,
  File "/home/masti/pw/core/pywikibot/site/_generators.py", line 202, in preloadpages
    for pagedata in rvgen:
  File "/usr/lib/python3.10/_collections_abc.py", line 330, in __next__
    return self.send(None)
  File "/home/masti/pw/core/pywikibot/tools/collections.py", line 279, in send
    return next(self._started_gen)
  File "/home/masti/pw/core/pywikibot/data/api/_generators.py", line 781, in generator
    yield from super().generator
  File "/home/masti/pw/core/pywikibot/data/api/_generators.py", line 607, in generator
    self.data = self.request.submit()
  File "/home/masti/pw/core/pywikibot/data/api/_requests.py", line 993, in submit
    response, use_get = self._http_request(use_get, uri, body, headers,
  File "/home/masti/pw/core/pywikibot/data/api/_requests.py", line 684, in _http_request
    response = http.request(self.site, uri=uri,
  File "/home/masti/pw/core/pywikibot/comms/http.py", line 283, in request
    r = fetch(baseuri, headers=headers, **kwargs)
  File "/home/masti/pw/core/pywikibot/comms/http.py", line 457, in fetch
    callback(response)
  File "/home/masti/pw/core/pywikibot/comms/http.py", line 354, in error_handling_callback
    raise ServerError(
pywikibot.exceptions.ServerError: 502 Server Error: Server Hangup
CRITICAL: Exiting due to uncaught exception ServerError: 502 Server Error: Server Hangup

What should have happened instead?:
wait for server to come back as the errors are temporary

Software version:

Pywikibot: [https] masti01-pywikibot.git (04383ba, g18666, 2024/05/06, 16:04:47, master)
Release version: 9.2.0.dev2
packaging version: 24.0
mwparserfromhell version: 0.6.6
wikitextparser version: n/a
requests version: 2.31.0
  cacerts: /home/masti/pw/core/venv/lib/python3.10/site-packages/certifi/cacert.pem
    certificate test: ok
Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]

Event Timeline

Does it helps to increase the socket timeout (read timeout i.e. the second value of that tuple)?
https://doc.wikimedia.org/pywikibot/master/api_ref/pywikibot.config.html#http-settings

I will test and come back with the answer. Unfortunately it is not easy replicated so one have to wait till it happens.

I increased the timeout to 120s but it crashed again with same error.

Thank you for this investigation.

I have been running into these lately, too. Especially when calling DataSite.loadrevisions.

Pretty sure this is an upstream issue. 502 indicates a connection problem. Timeout problems were found few weeks ago on MW 1.43 starting with T359427 and I made some local work-arounds to circumvent some server calls. See the underlying task due to that issue.

@Masti: What was your pagegenerator option when running your script?

Change #1029487 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [fix] retry api request on ServerError

https://gerrit.wikimedia.org/r/1029487

Xqt triaged this task as High priority.

@Masti: What was your pagegenerator option when running your script?

-start:'!'

this and similar scripts sometimes run OK. but crash from time to time.

BTW: sometimes the server error is 503 like:
ERROR: Traceback (most recent call last):

File "/home/masti/pw/core/pywikibot/data/api/_requests.py", line 684, in _http_request
  response = http.request(self.site, uri=uri,
File "/home/masti/pw/core/pywikibot/comms/http.py", line 283, in request
  r = fetch(baseuri, headers=headers, **kwargs)
File "/home/masti/pw/core/pywikibot/comms/http.py", line 457, in fetch
  callback(response)
File "/home/masti/pw/core/pywikibot/comms/http.py", line 354, in error_handling_callback
  raise ServerError(

pywikibot.exceptions.ServerError: 503 Server Error: Service Unavailable

Traceback (most recent call last):

File "/home/masti/pw/core/pwb.py", line 40, in <module>
  sys.exit(main())
File "/home/masti/pw/core/pwb.py", line 36, in main
  runpy.run_path(str(path), run_name='__main__')
File "/usr/lib/python3.10/runpy.py", line 289, in run_path
  return _run_module_code(code, init_globals, run_name,
File "/usr/lib/python3.10/runpy.py", line 96, in _run_module_code
  _run_code(code, mod_globals, init_globals,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
  exec(code, run_globals)
File "/home/masti/pw/core/pywikibot/scripts/wrapper.py", line 521, in <module>
  main()
File "/home/masti/pw/core/pywikibot/scripts/wrapper.py", line 505, in main
  if not execute():
File "/home/masti/pw/core/pywikibot/scripts/wrapper.py", line 492, in execute
  run_python_file(filename, script_args, module)
File "/home/masti/pw/core/pywikibot/scripts/wrapper.py", line 149, in run_python_file
  exec(compile(source, filename, 'exec', dont_inherit=True),
File "masti/ms-contains.py", line 433, in <module>
  main()
File "masti/ms-contains.py", line 427, in main
  bot.run()  # guess what it does
File "masti/ms-contains.py", line 147, in run
  for page in self.generator:
File "/home/masti/pw/core/pywikibot/pagegenerators/__init__.py", line 660, in PreloadingGenerator
  yield from site.preloadpages(group, groupsize=groupsize,
File "/home/masti/pw/core/pywikibot/site/_generators.py", line 202, in preloadpages
  for pagedata in rvgen:
File "/usr/lib/python3.10/_collections_abc.py", line 330, in __next__
  return self.send(None)
File "/home/masti/pw/core/pywikibot/tools/collections.py", line 279, in send
  return next(self._started_gen)
File "/home/masti/pw/core/pywikibot/data/api/_generators.py", line 781, in generator
  yield from super().generator
File "/home/masti/pw/core/pywikibot/data/api/_generators.py", line 607, in generator
  self.data = self.request.submit()
File "/home/masti/pw/core/pywikibot/data/api/_requests.py", line 993, in submit
  response, use_get = self._http_request(use_get, uri, body, headers,
File "/home/masti/pw/core/pywikibot/data/api/_requests.py", line 684, in _http_request
  response = http.request(self.site, uri=uri,
File "/home/masti/pw/core/pywikibot/comms/http.py", line 283, in request
  r = fetch(baseuri, headers=headers, **kwargs)
File "/home/masti/pw/core/pywikibot/comms/http.py", line 457, in fetch
  callback(response)
File "/home/masti/pw/core/pywikibot/comms/http.py", line 354, in error_handling_callback
  raise ServerError(

pywikibot.exceptions.ServerError: 503 Server Error: Service Unavailable
CRITICAL: Exiting due to uncaught exception ServerError: 503 Server Error: Service Unavailable

@Masti: I assumed the -start:! tested it on several wikis but didn’t run into this issue. I made a patch and both server errors should lead to retry loops.

I still think this has something to do with the database conversion of MW 1.43: some api requests are very lame or fails with timeout or server error.

@Masti: I assumed the -start:! tested it on several wikis but didn’t run into this issue. I made a patch and both server errors should lead to retry loops.

I still think this has something to do with the database conversion of MW 1.43: some api requests are very lame or fails with timeout or server error.

I think so. But can we handle the timeouts in pw? As the other erros from API are handled with incremental delays and retry.

I think so. But can we handle the timeouts in pw? As the other erros from API are handled with incremental delays and retry.

We can only increase the socket_timeout value for real timeout exceptions. [1] Maybe within exception handling callback in http module. [3] For wikidata this should be recommended currently (somehow). [2] But this will not help vor ServerError exceptions. Maybe we can show the html response in such case (or log it) for further debugging also for upstream issues.

[1] https://doc.wikimedia.org/pywikibot/master/api_ref/pywikibot.config.html#http-settings
[2] https://doc.wikimedia.org/pywikibot/master/api_ref/pywikibot.site.html#pywikibot.site._generators.GeneratorsMixin.alllinks
[3] https://doc.wikimedia.org/pywikibot/master/api_ref/pywikibot.comms.html#comms.http.error_handling_callback

Change #1029487 merged by jenkins-bot:

[pywikibot/core@master] [fix] retry api request on ServerError

https://gerrit.wikimedia.org/r/1029487