Page MenuHomePhabricator

redirect.py fails for RuntimeError
Open, HighPublic

Description

pwb.py redirect broken -lang:ar fails:

>>> 11 فبراير <<<
Traceback (most recent call last):
  File "C:\pwb\core\pwb.py", line 256, in <module>
    if not main():
  File "C:\pwb\core\pwb.py", line 250, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "C:\pwb\core\pwb.py", line 121, in run_python_file
    main_mod.__dict__)
  File ".\scripts\redirect.py", line 914, in <module>
    main()
  File ".\scripts\redirect.py", line 911, in main
    bot.run()
  File ".\scripts\redirect.py", line 772, in run
    self.delete_broken_redirects()
  File ".\scripts\redirect.py", line 460, in delete_broken_redirects
    self.delete_1_broken_redirect(redir_name)
  File ".\scripts\redirect.py", line 474, in delete_1_broken_redirect
    targetPage = redir_page.getRedirectTarget()
  File "C:\pwb\core\pywikibot\page.py", line 1545, in getRedirectTarget
    return self.site.getredirtarget(self)
  File "C:\pwb\core\pywikibot\site.py", line 3022, in getredirtarget
    % title.encode(self.encoding()))
RuntimeError: getredirtarget: No 'redirects' found for page '11 \xd9\x81\xd8\xa8
\xd8\xb1\xd8\xa7\xd9\x8a\xd8\xb1'.
<type 'exceptions.RuntimeError'>
CRITICAL: Closing network session.

The processed page is not a redirect page and there is no redirect target at all.

Event Timeline

Xqt created this task.Mar 25 2016, 7:27 AM
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptMar 25 2016, 7:27 AM
Xqt triaged this task as High priority.Mar 25 2016, 7:33 AM
Xqt claimed this task.Mar 25 2016, 7:39 AM
Xqt added a comment.Mar 25 2016, 9:22 AM

Sometimes Site.broken_redirects() returns pages which wrongly are marked as redirects:

>>> from __future__ import unicode_literals
>>> import pwb, pywikibot as py
>>> s = py.Site('ar')
>>> for page in s.broken_redirects():
	if page.title().startswith('11'): break

pagedict gives:

{u'contentmodel': u'wikitext',
 u'lastrevid': 15902649,
 u'length': 40,
 u'new': u'',
 u'ns': 105,
 u'pageid': 2677629,
 u'pagelanguage': u'ar',
 u'pagelanguagedir': u'rtl',
 u'pagelanguagehtmlcode': u'ar',
 u'protection': [],
 u'redirect': u'',
 u'restrictiontypes': [u'edit', u'move'],
 u'title': u':11 \u0641\u0628\u0631\u0627\u064a\u0631',
 u'touched': u'2015-08-25T13:24:28Z'}

reloading pageinfo, pagedict hasn't redirects anymore:

>>> del page._isredir
>>> page.isRedirectPage()
{u'contentmodel': u'wikitext',
 u'lastrevid': 18508152,
 u'length': 15807,
 u'ns': 0,
 u'pageid': 2874,
 u'pagelanguage': u'ar',
 u'pagelanguagedir': u'rtl',
 u'pagelanguagehtmlcode': u'ar',
 u'protection': [],
 u'restrictiontypes': [u'edit', u'move'],
 u'title': u'11 \u0641\u0628\u0631\u0627\u064a\u0631',
 u'touched': u'2016-03-21T21:36:22Z'}
False

Change 279589 had a related patch set uploaded (by Xqt):
[bugfix] Don't fail missing 'redirects' with RuntimeError

https://gerrit.wikimedia.org/r/279589

jayvdb added a subscriber: jayvdb.EditedMay 25 2016, 4:17 AM

I can reproduce the problem with

site = pywikibot.Site('ar', 'wikipedia')

for page in site.broken_redirects():
    if page.isRedirectPage():
        try:
            redir = site.getredirtarget(page)
        except Exception as e:
            print(e, page)
            break

RuntimeError [[ar:أصوات الحيوانات]]

>>> page._isredir
True

After that, calling site.loadpageinfo(page) changes page._isredir to False

Just noting there is a hack in APISite.page_isredirect intended to help with a similar type of problem. https://gerrit.wikimedia.org/r/#/c/101784/ That means page._isredir is being set to True somewhere, and as Xqt has shown above there is a 'redirects': '' in a pagedict somewhere.

https://ar.wikipedia.org/w/api.php?action=query&generator=querypage&gqppage=BrokenRedirects&gqplimit=100&prop=info currently contains

"2683712": {
    "pageid": 2683712,
    "ns": 105,
    "title": ":\u0623\u0635\u0648\u0627\u062a \u0627\u0644\u062d\u064a\u0648\u0627\u0646\u0627\u062a",
    "contentmodel": "wikitext",
    "pagelanguage": "ar",
    "pagelanguagehtmlcode": "ar",
    "pagelanguagedir": "rtl",
    "touched": "2016-05-15T17:21:43Z",
    "lastrevid": 15915517,
    "length": 54,
    "redirect": "",
    "new": ""
},

So the incorrect setting for redirect is coming directly out of the MediaWiki API.

An alternative solution is to del the _isredir of every page emitted by APISite.broken_redirects, which will cause the page info to be loaded on demand with the correct redirect status. That is IMO the 'correct' solution, because we know the special page emit incorrect data for that property.

The provided patch only works around the problem within getredirtarget -- if these Page objects are used in other ways, not involving getredirtarget, other bugs will occur.

Anyway, the MediaWiki bug should be raised Upstream , which will help us understand the bug better. Maybe other properties in the page info are also wrong?? If other attributes are wrong, IMO we should clearly state that Pages emitted from broken_redirects is very likely to be incorrect, and it is the callers responsibility to recheck the data.

Xqt added a comment.Sep 4 2018, 9:52 AM

See also this traceback:

C:\pwb\GIT\core>py -3 pwb.py redirect -lang:tg do -simulate
Retrieving double redirect special page...
Retrieving 24 pages from wikipedia:tg.

>>> Лоиҳа:Тоҷикистон <<<

1 pages read
0 pages written
Execution time: 0 seconds
Read operation time: 0 seconds
Script terminated by exception:

ERROR: RuntimeError: getredirtarget: No 'redirects' found for page Лоиҳа:Тоҷикис
тон.
Traceback (most recent call last):
  File "pwb.py", line 253, in <module>
    if not main():
  File "pwb.py", line 246, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "pwb.py", line 115, in run_python_file
    main_mod.__dict__)
  File ".\scripts\redirect.py", line 806, in <module>
    main()
  File ".\scripts\redirect.py", line 802, in main
    bot.run()
  File "C:\pwb\GIT\core\pywikibot\bot.py", line 1505, in run
    self.treat(page)
  File "C:\pwb\GIT\core\pywikibot\bot.py", line 1737, in treat
    self.treat_page()
  File ".\scripts\redirect.py", line 715, in treat_page
    self.action_treat(self.current_page)
  File ".\scripts\redirect.py", line 585, in fix_1_double_redirect
    targetPage = newRedir.getRedirectTarget()
  File "C:\pwb\GIT\core\pywikibot\page.py", line 1668, in getRedirectTarget
    return self.site.getredirtarget(self)
  File "C:\pwb\GIT\core\pywikibot\site.py", line 3208, in getredirtarget
    .format(title))
RuntimeError: getredirtarget: No 'redirects' found for page Лоиҳа:Тоҷикистон.
<class 'RuntimeError'>
CRITICAL: Closing network session.

C:\pwb\GIT\core>
D3r1ck01 moved this task from Backlog to Needs Review on the Pywikibot board.Nov 5 2018, 11:32 AM
Xqt added a comment.Feb 26 2019, 3:09 PM

and again:

C:\pwb\GIT\core>pwb.py redirect.py do -simulate -lang:pa
Retrieving double redirect special page...
Retrieving 25 pages from wikipedia:pa.


>>> ਅਨਾਤੋਲੇ ਫ਼ਰਾਂਸ <<<
ERROR: Page [[pa:ਅਨਾਤੋਲੇ ਫ਼ਰਾਂਸ]] is a circular redirect.
Skipping [[pa:ਅਨਾਤੋਲੇ ਫ਼ਰਾਂਸ]].


>>> ਨੈਸ਼ਨਲ ਫਿਲਮ ਅਵਾਰਡ <<<
ERROR: Page [[pa:ਨੈਸ਼ਨਲ ਫਿਲਮ ਅਵਾਰਡ]] is a circular redirect.
Skipping [[pa:ਨੈਸ਼ਨਲ ਫਿਲਮ ਅਵਾਰਡ]].


>>> ਫਾਟਕ:ਇਤਿਹਾਸ <<<

3 pages read
0 pages written
Execution time: 1 seconds
Read operation time: 0 seconds
Script terminated by exception:

ERROR: RuntimeError: getredirtarget: No 'redirects' found for page ਫਾਟਕ:ਇਤਿਹਾਸ.
Traceback (most recent call last):
  File "C:\pwb\GIT\core\pwb.py", line 232, in <module>
    if not main():
  File "C:\pwb\GIT\core\pwb.py", line 225, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "C:\pwb\GIT\core\pwb.py", line 94, in run_python_file
    main_mod.__dict__)
  File ".\scripts\redirect.py", line 773, in <module>
    main()
  File ".\scripts\redirect.py", line 769, in main
    bot.run()
  File "C:\pwb\GIT\core\pywikibot\bot.py", line 1508, in run
    self.treat(page)
  File ".\scripts\redirect.py", line 682, in treat
    super(RedirectRobot, self).treat(page)
  File "C:\pwb\GIT\core\pywikibot\bot.py", line 1735, in treat
    self.treat_page()
  File ".\scripts\redirect.py", line 572, in fix_1_double_redirect
    targetPage = newRedir.getRedirectTarget()
  File "C:\pwb\GIT\core\pywikibot\page.py", line 1676, in getRedirectTarget
    return self.site.getredirtarget(self)
  File "C:\pwb\GIT\core\pywikibot\site.py", line 3208, in getredirtarget
    .format(title))
RuntimeError: getredirtarget: No 'redirects' found for page ਫਾਟਕ:ਇਤਿਹਾਸ.
CRITICAL: Exiting due to uncaught exception <class 'RuntimeError'>

C:\pwb\GIT\core>
Dvorapa added a subscriber: Dvorapa.EditedFeb 26 2019, 3:49 PM

The redirect was just removed/deleted/changed, but the linktable seems to be never updated? Or the cache deprecation for redirects seems to take ages? Which is completely fine in terms of web users expectations and web crawlers, but is not really helpful when robots want to work with the redirect contents. And I'm a little bit surprised API probably returns cached or outdated information here

BTW I noticed similar behavior in MediaWiki-Search when a redirect is deleted. It emerges as a red link redirect in in the search results for ages too.

Anomie added a subscriber: Anomie.

@Dvorapa added a project: MediaWiki-API.

What exactly do you think is a bug in the API here? Please supply the queries, the current output, and what you think is wrong with it.

and again:

C:\pwb\GIT\core>pwb.py redirect.py do -simulate -lang:pa
Retrieving double redirect special page...
Retrieving 25 pages from wikipedia:pa.
[...]
ERROR: RuntimeError: getredirtarget: No 'redirects' found for page ਫਾਟਕ:ਇਤਿਹਾਸ.

I note that https://pa.wikipedia.org/w/api.php?action=query&generator=querypage&gqppage=DoubleRedirects&gqplimit=max&prop=info&formatversion=2 currently returns, in part,

{
    "pageid": 68990,
    "ns": 0,
    "title": "ਫਾਟਕ:ਇਤਿਹਾਸ",
    "contentmodel": "wikitext",
    "pagelanguage": "pa",
    "pagelanguagehtmlcode": "pa",
    "pagelanguagedir": "ltr",
    "touched": "2016-03-25T19:42:10Z",
    "lastrevid": 291775,
    "length": 89,
    "redirect": true,
    "new": true
},

which is one of those situations where there's a page that's unreachable by title, as "ਫਾਟਕ:ਇਤਿਹਾਸ" should be in namespace 100. See https://pa.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&pageids=68990|73388&formatversion=2:

"query": {
    "pages": [
        {
            "pageid": 68990,
            "ns": 0,
            "title": "ਫਾਟਕ:ਇਤਿਹਾਸ",
            "revisions": [
                {
                    "contentformat": "text/x-wiki",
                    "contentmodel": "wikitext",
                    "content": "#ਰੀਡਿਰੈਕਟ [[ਵਿਕੀਪੀਡੀਆ:ਫਾਟਕ:ਇਤਿਹਾਸ]]"
                }
            ]
        },
        {
            "pageid": 73388,
            "ns": 100,
            "title": "ਫਾਟਕ:ਇਤਿਹਾਸ",
            "revisions": [
                {
                    "contentformat": "text/x-wiki",
                    "contentmodel": "wikitext",
                    "content": ""
                }
            ]
        }
    ]
}