Page MenuHomePhabricator

cosmetic_changes.py does not work on pages containing external links to archive.org or openlibrary.org
Closed, InvalidPublicBUG REPORT

Description

Background information:

  • Recently, archive.org faced several cyberattacks. See en:Internet Archive#Cyberattacks for details.
  • Some archive.org services such as openlibrary.org are still offline. See https://archive.org/ and https://openlibrary.org/
  • When cosmetic_changes.py works on a page that has links to archive.org or openlibrary.org, the bot gives a 503 Server Error: Service Temporarily Unavailable response.

Command line:

python pwb.py cosmetic_changes -always -newpages -lang:arz

Output:

>>> جون هاستينجز، بارون هاستينجز الأول <<<
ERROR: Traceback (most recent call last):
  File "C:\Users\Mohammed\Downloads\core\pywikibot\data\api\_requests.py", line 689, in _http_request
    response = http.request(self.site, uri=uri,
                            method='GET' if use_get else 'POST',
                            data=data, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 283, in request
    r = fetch(baseuri, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 457, in fetch
    callback(response)
    ~~~~~~~~^^^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 353, in error_handling_callback
    raise ServerError(
        f'{response.status_code} Server Error: {response.reason}')
pywikibot.exceptions.ServerError: 503 Server Error: Service Temporarily Unavailable

WARNING: Waiting 5.0 seconds before retrying.
ERROR: Traceback (most recent call last):
  File "C:\Users\Mohammed\Downloads\core\pywikibot\data\api\_requests.py", line 689, in _http_request
    response = http.request(self.site, uri=uri,
                            method='GET' if use_get else 'POST',
                            data=data, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 283, in request
    r = fetch(baseuri, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 457, in fetch
    callback(response)
    ~~~~~~~~^^^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 353, in error_handling_callback
    raise ServerError(
        f'{response.status_code} Server Error: {response.reason}')
pywikibot.exceptions.ServerError: 503 Server Error: Service Temporarily Unavailable

WARNING: Waiting 10.0 seconds before retrying.
ERROR: Traceback (most recent call last):
  File "C:\Users\Mohammed\Downloads\core\pywikibot\data\api\_requests.py", line 689, in _http_request
    response = http.request(self.site, uri=uri,
                            method='GET' if use_get else 'POST',
                            data=data, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 283, in request
    r = fetch(baseuri, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 457, in fetch
    callback(response)
    ~~~~~~~~^^^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 353, in error_handling_callback
    raise ServerError(
        f'{response.status_code} Server Error: {response.reason}')
pywikibot.exceptions.ServerError: 503 Server Error: Service Temporarily Unavailable

WARNING: Waiting 20.0 seconds before retrying.
ERROR: Traceback (most recent call last):
  File "C:\Users\Mohammed\Downloads\core\pywikibot\data\api\_requests.py", line 689, in _http_request
    response = http.request(self.site, uri=uri,
                            method='GET' if use_get else 'POST',
                            data=data, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 283, in request
    r = fetch(baseuri, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 457, in fetch
    callback(response)
    ~~~~~~~~^^^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 353, in error_handling_callback
    raise ServerError(
        f'{response.status_code} Server Error: {response.reason}')
pywikibot.exceptions.ServerError: 503 Server Error: Service Temporarily Unavailable

WARNING: Waiting 40.0 seconds before retrying.
ERROR: Traceback (most recent call last):
  File "C:\Users\Mohammed\Downloads\core\pywikibot\data\api\_requests.py", line 689, in _http_request
    response = http.request(self.site, uri=uri,
                            method='GET' if use_get else 'POST',
                            data=data, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 283, in request
    r = fetch(baseuri, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 457, in fetch
    callback(response)
    ~~~~~~~~^^^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 353, in error_handling_callback
    raise ServerError(
        f'{response.status_code} Server Error: {response.reason}')
pywikibot.exceptions.ServerError: 503 Server Error: Service Temporarily Unavailable

WARNING: Waiting 80.0 seconds before retrying.
ERROR: Traceback (most recent call last):
  File "C:\Users\Mohammed\Downloads\core\pywikibot\data\api\_requests.py", line 689, in _http_request
    response = http.request(self.site, uri=uri,
                            method='GET' if use_get else 'POST',
                            data=data, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 283, in request
    r = fetch(baseuri, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 457, in fetch
    callback(response)
    ~~~~~~~~^^^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 353, in error_handling_callback
    raise ServerError(
        f'{response.status_code} Server Error: {response.reason}')
pywikibot.exceptions.ServerError: 503 Server Error: Service Temporarily Unavailable

WARNING: Waiting 120.0 seconds before retrying.

What should have happened instead?:

  • The bot should have skipped such pages and continued working on other pages instead of repeatedly waiting.

Software version:

Pywikibot: [https] r-pywikibot-core (dfaf905, g19134, 2024/10/19, 10:32:30, master)
Release version: 9.5.0.dev2
packaging version: 24.1
mwparserfromhell version: 0.6.6
wikitextparser version: 0.56.3
requests version: 2.32.3
    certificate test: ok
Python: 3.13.0 (tags/v3.13.0:60403a5, Oct  7 2024, 09:38:07) [MSC v.1941 64 bit (AMD64)]

Event Timeline

command which can be used to find this issue is py -3.11 -m pwb -lang:arz -simulate cosmetic_changes -page:"جون هاستينجز، بارون هاستينجز الأول"

@Meno25: use -ignore:page or -ignore:method option to skip affected pages

https://doc.wikimedia.org/pywikibot/master/scripts_ref/scripts.html#module-scripts.cosmetic_changes

To decrease the retries the command option .max_retries:<no> can be used instead of changing the user-config.py

@Meno25: use -ignore:page or -ignore:method option to skip affected pages

https://doc.wikimedia.org/pywikibot/master/scripts_ref/scripts.html#module-scripts.cosmetic_changes

To decrease the retries the command option .max_retries:<no> can be used instead of changing the user-config.py

Thank you, @Xqt, for the advice. I changed max_retries to be max_retries = 1 and this is the output:

Script terminated by exception:

ERROR: Maximum retries attempted without success. (TimeoutError)
Traceback (most recent call last):
  File "C:\Users\Mohammed\Downloads\core\pwb.py", line 40, in <module>
    sys.exit(main())
             ~~~~^^
  File "C:\Users\Mohammed\Downloads\core\pwb.py", line 36, in main
    runpy.run_path(str(path), run_name='__main__')
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 287, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\Mohammed\Downloads\core\pywikibot\scripts\wrapper.py", line 544, in <module>
    main()
    ~~~~^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\scripts\wrapper.py", line 528, in main
    if not execute():
           ~~~~~~~^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\scripts\wrapper.py", line 515, in execute
    run_python_file(filename, script_args, module)
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\scripts\wrapper.py", line 152, in run_python_file
    exec(compile(source, filename, 'exec', dont_inherit=True),
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         main_mod.__dict__)
         ^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\scripts\cosmetic_changes.py", line 133, in <module>
    main()
    ~~~~^^
  File "C:\Users\Mohammed\Downloads\core\scripts\cosmetic_changes.py", line 129, in main
    bot.run()
    ~~~~~~~^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\bot.py", line 1582, in run
    self.treat(page)
    ~~~~~~~~~~^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\bot.py", line 1835, in treat
    self.treat_page()
    ~~~~~~~~~~~~~~~^^
  File "C:\Users\Mohammed\Downloads\core\scripts\cosmetic_changes.py", line 86, in treat_page
    new_text = cc_toolkit.change(old_text)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\cosmetic_changes.py", line 319, in change
    new_text = self._change(text)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\cosmetic_changes.py", line 313, in _change
    text = self.safe_execute(method, text)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\cosmetic_changes.py", line 299, in safe_execute
    result = method(text)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\cosmetic_changes.py", line 684, in cleanUpLinks
    text = textlib.replaceExcept(text, linkR, handleOneLink,
                                 ['comment', 'math', 'nowiki', 'pre',
                                  'startspace'])
  File "C:\Users\Mohammed\Downloads\core\pywikibot\textlib.py", line 449, in replaceExcept
    replacement = new(match)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\cosmetic_changes.py", line 571, in handleOneLink
    is_interwiki = self.site.isInterwikiLink(titleWithSection)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\site\_basesite.py", line 367, in isInterwikiLink
    linkfam, linkcode = pywikibot.Link(text, self).parse_site()
                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\page\_links.py", line 365, in parse_site
    newsite = self._source.interwiki(prefix)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\site\_apisite.py", line 165, in interwiki
    return self._interwikimap[prefix].site
           ~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\site\_interwikimap.py", line 79, in __getitem__
    raise self._iw_sites[prefix].site
  File "C:\Users\Mohammed\Downloads\core\pywikibot\site\_interwikimap.py", line 26, in site
    self._site = pywikibot.Site(
                 ~~~~~~~~~~~~~~^
        url=self.url, fam=None if self.local else self.prefix)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\__init__.py", line 243, in Site
    _sites[key] = interface(code=code, fam=fam, user=user)
                  ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\site\_apisite.py", line 142, in __init__
    self.login(cookie_only=True)
    ~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\site\_apisite.py", line 400, in login
    if self.userinfo['name'] == self.user():
       ^^^^^^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\site\_apisite.py", line 675, in userinfo
    uidata = uirequest.submit()
  File "C:\Users\Mohammed\Downloads\core\pywikibot\data\api\_requests.py", line 1019, in submit
    response, use_get = self._http_request(use_get, uri, body, headers,
                        ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                           paramstring)
                                           ^^^^^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\data\api\_requests.py", line 722, in _http_request
    self.wait()
    ~~~~~~~~~^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\data\api\_requests.py", line 984, in wait
    super().wait(delay)
    ~~~~~~~~~~~~^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\data\__init__.py", line 47, in wait
    raise pywikibot.exceptions.TimeoutError(
        'Maximum retries attempted without success.')
pywikibot.exceptions.TimeoutError: Maximum retries attempted without success.
CRITICAL: Exiting due to uncaught exception TimeoutError: Maximum retries attempted without success.

I would like also to add that I face this issue on Arabic Wikipedia (arwiki) as well because arwiki has a lot of links to archive.org

When using the command:

python pwb.py cosmetic_changes -always -newpages -ignore:method -lang:arz

the bot skips the page and the output is:

>>> جون هاستينجز، بارون هاستينجز الأول <<<
ERROR: Traceback (most recent call last):
  File "C:\Users\Mohammed\Downloads\core\pywikibot\data\api\_requests.py", line 689, in _http_request
    response = http.request(self.site, uri=uri,
                            method='GET' if use_get else 'POST',
                            data=data, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 283, in request
    r = fetch(baseuri, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 457, in fetch
    callback(response)
    ~~~~~~~~^^^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 353, in error_handling_callback
    raise ServerError(
        f'{response.status_code} Server Error: {response.reason}')
pywikibot.exceptions.ServerError: 503 Server Error: Service Temporarily Unavailable

WARNING: Waiting 5.0 seconds before retrying.
ERROR: Traceback (most recent call last):
  File "C:\Users\Mohammed\Downloads\core\pywikibot\data\api\_requests.py", line 689, in _http_request
    response = http.request(self.site, uri=uri,
                            method='GET' if use_get else 'POST',
                            data=data, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 283, in request
    r = fetch(baseuri, headers=headers, **kwargs)
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 457, in fetch
    callback(response)
    ~~~~~~~~^^^^^^^^^^
  File "C:\Users\Mohammed\Downloads\core\pywikibot\comms\http.py", line 353, in error_handling_callback
    raise ServerError(
        f'{response.status_code} Server Error: {response.reason}')
pywikibot.exceptions.ServerError: 503 Server Error: Service Temporarily Unavailable

WARNING: Unable to perform "cleanUpLinks" on "جون هاستينجز، بارون هاستينجز الأول"!
ERROR: Maximum retries attempted without success.
No changes were needed on [[arz:جون هاستينجز، بارون هاستينجز الأول]]

then the bot continues working on other pages normally. So, this is an excellent workaround to be used until archive.org is back fully online. Thank you, @Xqt, for your comment.