Page MenuHomePhabricator

redirect.py fails for RuntimeError
Open, LowestPublic

Description

pwb.py redirect broken -lang:ar fails:

>>> 11 فبراير <<<
Traceback (most recent call last):
  File "C:\pwb\core\pwb.py", line 256, in <module>
    if not main():
  File "C:\pwb\core\pwb.py", line 250, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "C:\pwb\core\pwb.py", line 121, in run_python_file
    main_mod.__dict__)
  File ".\scripts\redirect.py", line 914, in <module>
    main()
  File ".\scripts\redirect.py", line 911, in main
    bot.run()
  File ".\scripts\redirect.py", line 772, in run
    self.delete_broken_redirects()
  File ".\scripts\redirect.py", line 460, in delete_broken_redirects
    self.delete_1_broken_redirect(redir_name)
  File ".\scripts\redirect.py", line 474, in delete_1_broken_redirect
    targetPage = redir_page.getRedirectTarget()
  File "C:\pwb\core\pywikibot\page.py", line 1545, in getRedirectTarget
    return self.site.getredirtarget(self)
  File "C:\pwb\core\pywikibot\site.py", line 3022, in getredirtarget
    % title.encode(self.encoding()))
RuntimeError: getredirtarget: No 'redirects' found for page '11 \xd9\x81\xd8\xa8
\xd8\xb1\xd8\xa7\xd9\x8a\xd8\xb1'.
<type 'exceptions.RuntimeError'>
CRITICAL: Closing network session.

The processed page is not a redirect page and there is no redirect target at all.

Event Timeline

Xqt triaged this task as High priority.Mar 25 2016, 7:33 AM

Sometimes Site.broken_redirects() returns pages which wrongly are marked as redirects:

>>> from __future__ import unicode_literals
>>> import pwb, pywikibot as py
>>> s = py.Site('ar')
>>> for page in s.broken_redirects():
	if page.title().startswith('11'): break

pagedict gives:

{u'contentmodel': u'wikitext',
 u'lastrevid': 15902649,
 u'length': 40,
 u'new': u'',
 u'ns': 105,
 u'pageid': 2677629,
 u'pagelanguage': u'ar',
 u'pagelanguagedir': u'rtl',
 u'pagelanguagehtmlcode': u'ar',
 u'protection': [],
 u'redirect': u'',
 u'restrictiontypes': [u'edit', u'move'],
 u'title': u':11 \u0641\u0628\u0631\u0627\u064a\u0631',
 u'touched': u'2015-08-25T13:24:28Z'}

reloading pageinfo, pagedict hasn't redirects anymore:

>>> del page._isredir
>>> page.isRedirectPage()
{u'contentmodel': u'wikitext',
 u'lastrevid': 18508152,
 u'length': 15807,
 u'ns': 0,
 u'pageid': 2874,
 u'pagelanguage': u'ar',
 u'pagelanguagedir': u'rtl',
 u'pagelanguagehtmlcode': u'ar',
 u'protection': [],
 u'restrictiontypes': [u'edit', u'move'],
 u'title': u'11 \u0641\u0628\u0631\u0627\u064a\u0631',
 u'touched': u'2016-03-21T21:36:22Z'}
False

Change 279589 had a related patch set uploaded (by Xqt):
[bugfix] Don't fail missing 'redirects' with RuntimeError

https://gerrit.wikimedia.org/r/279589

I can reproduce the problem with

site = pywikibot.Site('ar', 'wikipedia')

for page in site.broken_redirects():
    if page.isRedirectPage():
        try:
            redir = site.getredirtarget(page)
        except Exception as e:
            print(e, page)
            break

RuntimeError [[ar:أصوات الحيوانات]]

>>> page._isredir
True

After that, calling site.loadpageinfo(page) changes page._isredir to False

Just noting there is a hack in APISite.page_isredirect intended to help with a similar type of problem. https://gerrit.wikimedia.org/r/#/c/101784/ That means page._isredir is being set to True somewhere, and as Xqt has shown above there is a 'redirects': '' in a pagedict somewhere.

https://ar.wikipedia.org/w/api.php?action=query&generator=querypage&gqppage=BrokenRedirects&gqplimit=100&prop=info currently contains

"2683712": {
    "pageid": 2683712,
    "ns": 105,
    "title": ":\u0623\u0635\u0648\u0627\u062a \u0627\u0644\u062d\u064a\u0648\u0627\u0646\u0627\u062a",
    "contentmodel": "wikitext",
    "pagelanguage": "ar",
    "pagelanguagehtmlcode": "ar",
    "pagelanguagedir": "rtl",
    "touched": "2016-05-15T17:21:43Z",
    "lastrevid": 15915517,
    "length": 54,
    "redirect": "",
    "new": ""
},

So the incorrect setting for redirect is coming directly out of the MediaWiki API.

An alternative solution is to del the _isredir of every page emitted by APISite.broken_redirects, which will cause the page info to be loaded on demand with the correct redirect status. That is IMO the 'correct' solution, because we know the special page emit incorrect data for that property.

The provided patch only works around the problem within getredirtarget -- if these Page objects are used in other ways, not involving getredirtarget, other bugs will occur.

Anyway, the MediaWiki bug should be raised Upstream , which will help us understand the bug better. Maybe other properties in the page info are also wrong?? If other attributes are wrong, IMO we should clearly state that Pages emitted from broken_redirects is very likely to be incorrect, and it is the callers responsibility to recheck the data.

See also this traceback:

C:\pwb\GIT\core>py -3 pwb.py redirect -lang:tg do -simulate
Retrieving double redirect special page...
Retrieving 24 pages from wikipedia:tg.

>>> Лоиҳа:Тоҷикистон <<<

1 pages read
0 pages written
Execution time: 0 seconds
Read operation time: 0 seconds
Script terminated by exception:

ERROR: RuntimeError: getredirtarget: No 'redirects' found for page Лоиҳа:Тоҷикис
тон.
Traceback (most recent call last):
  File "pwb.py", line 253, in <module>
    if not main():
  File "pwb.py", line 246, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "pwb.py", line 115, in run_python_file
    main_mod.__dict__)
  File ".\scripts\redirect.py", line 806, in <module>
    main()
  File ".\scripts\redirect.py", line 802, in main
    bot.run()
  File "C:\pwb\GIT\core\pywikibot\bot.py", line 1505, in run
    self.treat(page)
  File "C:\pwb\GIT\core\pywikibot\bot.py", line 1737, in treat
    self.treat_page()
  File ".\scripts\redirect.py", line 715, in treat_page
    self.action_treat(self.current_page)
  File ".\scripts\redirect.py", line 585, in fix_1_double_redirect
    targetPage = newRedir.getRedirectTarget()
  File "C:\pwb\GIT\core\pywikibot\page.py", line 1668, in getRedirectTarget
    return self.site.getredirtarget(self)
  File "C:\pwb\GIT\core\pywikibot\site.py", line 3208, in getredirtarget
    .format(title))
RuntimeError: getredirtarget: No 'redirects' found for page Лоиҳа:Тоҷикистон.
<class 'RuntimeError'>
CRITICAL: Closing network session.

C:\pwb\GIT\core>

and again:

C:\pwb\GIT\core>pwb.py redirect.py do -simulate -lang:pa
Retrieving double redirect special page...
Retrieving 25 pages from wikipedia:pa.


>>> ਅਨਾਤੋਲੇ ਫ਼ਰਾਂਸ <<<
ERROR: Page [[pa:ਅਨਾਤੋਲੇ ਫ਼ਰਾਂਸ]] is a circular redirect.
Skipping [[pa:ਅਨਾਤੋਲੇ ਫ਼ਰਾਂਸ]].


>>> ਨੈਸ਼ਨਲ ਫਿਲਮ ਅਵਾਰਡ <<<
ERROR: Page [[pa:ਨੈਸ਼ਨਲ ਫਿਲਮ ਅਵਾਰਡ]] is a circular redirect.
Skipping [[pa:ਨੈਸ਼ਨਲ ਫਿਲਮ ਅਵਾਰਡ]].


>>> ਫਾਟਕ:ਇਤਿਹਾਸ <<<

3 pages read
0 pages written
Execution time: 1 seconds
Read operation time: 0 seconds
Script terminated by exception:

ERROR: RuntimeError: getredirtarget: No 'redirects' found for page ਫਾਟਕ:ਇਤਿਹਾਸ.
Traceback (most recent call last):
  File "C:\pwb\GIT\core\pwb.py", line 232, in <module>
    if not main():
  File "C:\pwb\GIT\core\pwb.py", line 225, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "C:\pwb\GIT\core\pwb.py", line 94, in run_python_file
    main_mod.__dict__)
  File ".\scripts\redirect.py", line 773, in <module>
    main()
  File ".\scripts\redirect.py", line 769, in main
    bot.run()
  File "C:\pwb\GIT\core\pywikibot\bot.py", line 1508, in run
    self.treat(page)
  File ".\scripts\redirect.py", line 682, in treat
    super(RedirectRobot, self).treat(page)
  File "C:\pwb\GIT\core\pywikibot\bot.py", line 1735, in treat
    self.treat_page()
  File ".\scripts\redirect.py", line 572, in fix_1_double_redirect
    targetPage = newRedir.getRedirectTarget()
  File "C:\pwb\GIT\core\pywikibot\page.py", line 1676, in getRedirectTarget
    return self.site.getredirtarget(self)
  File "C:\pwb\GIT\core\pywikibot\site.py", line 3208, in getredirtarget
    .format(title))
RuntimeError: getredirtarget: No 'redirects' found for page ਫਾਟਕ:ਇਤਿਹਾਸ.
CRITICAL: Exiting due to uncaught exception <class 'RuntimeError'>

C:\pwb\GIT\core>

The redirect was just removed/deleted/changed, but the linktable seems to be never updated? Or the cache deprecation for redirects seems to take ages? Which is completely fine in terms of web users expectations and web crawlers, but is not really helpful when robots want to work with the redirect contents. And I'm a little bit surprised API probably returns cached or outdated information here

BTW I noticed similar behavior in MediaWiki-Search when a redirect is deleted. It emerges as a red link redirect in in the search results for ages too.

Anomie subscribed.

@Dvorapa added a project: MediaWiki-Action-API.

What exactly do you think is a bug in the API here? Please supply the queries, the current output, and what you think is wrong with it.

and again:

C:\pwb\GIT\core>pwb.py redirect.py do -simulate -lang:pa
Retrieving double redirect special page...
Retrieving 25 pages from wikipedia:pa.
[...]
ERROR: RuntimeError: getredirtarget: No 'redirects' found for page ਫਾਟਕ:ਇਤਿਹਾਸ.

I note that https://pa.wikipedia.org/w/api.php?action=query&generator=querypage&gqppage=DoubleRedirects&gqplimit=max&prop=info&formatversion=2 currently returns, in part,

{
    "pageid": 68990,
    "ns": 0,
    "title": "ਫਾਟਕ:ਇਤਿਹਾਸ",
    "contentmodel": "wikitext",
    "pagelanguage": "pa",
    "pagelanguagehtmlcode": "pa",
    "pagelanguagedir": "ltr",
    "touched": "2016-03-25T19:42:10Z",
    "lastrevid": 291775,
    "length": 89,
    "redirect": true,
    "new": true
},

which is one of those situations where there's a page that's unreachable by title, as "ਫਾਟਕ:ਇਤਿਹਾਸ" should be in namespace 100. See https://pa.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&pageids=68990|73388&formatversion=2:

"query": {
    "pages": [
        {
            "pageid": 68990,
            "ns": 0,
            "title": "ਫਾਟਕ:ਇਤਿਹਾਸ",
            "revisions": [
                {
                    "contentformat": "text/x-wiki",
                    "contentmodel": "wikitext",
                    "content": "#ਰੀਡਿਰੈਕਟ [[ਵਿਕੀਪੀਡੀਆ:ਫਾਟਕ:ਇਤਿਹਾਸ]]"
                }
            ]
        },
        {
            "pageid": 73388,
            "ns": 100,
            "title": "ਫਾਟਕ:ਇਤਿਹਾਸ",
            "revisions": [
                {
                    "contentformat": "text/x-wiki",
                    "contentmodel": "wikitext",
                    "content": ""
                }
            ]
        }
    ]
}

Change 279589 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] Don't fail missing 'redirects' with RuntimeError

https://gerrit.wikimedia.org/r/279589

Xqt lowered the priority of this task from High to Lowest.Jun 1 2019, 9:45 AM

still an upstream issue

Aklapper removed Xqt as the assignee of this task.Jun 19 2020, 4:31 PM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

This is solved on pywikibot side but is still an upstream issue I guess