Page MenuHomePhabricator

Pywikibot should continue or skip if page has blank-title link
Closed, DuplicatePublicBUG REPORT

Description

The Pywikibot-interwikidata.py will stop if it encounters a link like [[en:]] on a page.

Conversely, a link with a blank title (but an otherwise-valid interwiki prefix) will work within MediaWiki, it just lands on the main page on the destination wiki, ie: https://en.wikipedia.org/wiki/ redirects to https://en.wikipedia.org/wiki/Main_Page

Steps to Reproduce:

Insert an interlanguage link like [[en:]][[fr:]][[es:]][[pt:]] on any page, then attempt to run the interwikidata bot on that page. (These usually turn up on the main page of third-party wikis to point to the main page of that same project in some other language. MediaWiki lets you do this. The 'bot does not like this at all and dies.)

Actual Results:

An error is thrown at pywikibot/page.py somewhere near line 2271:

def __init__(self, source, title='', ns=0):
    """Instantiate a Page object."""
    if isinstance(source, pywikibot.site.BaseSite):
        if not title:
            raise ValueError('Title must be specified and not empty '
                             'if source is a Site.')
    super(Page, self).__init__(source, title, ns)

Title must be specified. The bot run then abruptly stops and must be manually restarted with -start: pointed to somewhere after the affected page(s).

Expected Results:

Any bad user-supplied links (prefix but no title) should simply be ignored and processing should continue.

One alternative (a bit kludgey, but adequate) is to assume the blank-title interwiki link to point to "Main Page" on the destination. If that named page doesn't exist (which is likely if the destination is another language), the bot will correctly detect the page doesn't exist and continue normally:

def __init__(self, source, title='', ns=0):
    """Instantiate a Page object."""
    if title=='':   # don't stop entire run if one [[:xx:]] link has blank title
        title='Main_Page'
    if isinstance(source, pywikibot.site.BaseSite):
        if not title:
            raise ValueError('Title must be specified and not empty '
                             'if source is a Site.')
    super(Page, self).__init__(source, title, ns)

After all, there's nothing stopping a user from dropping [[en:]] onto a page, so the bot scripts should either ignore it or recover gracefully instead of an abnormal exit.