Page MenuHomePhabricator

Page.exists(): Cannot auto detect whether a page title in different variant exists.
Open, HighPublic

Description

Technically a non-existent page is a "301 Moved Permanently"-type redirect to an existent page in a different variant (unless there is no such page). This affects wikis with LanguageConverter, including zh.wikipedia.
Eg. zh:匙吻鲟 (redirect=no link) is a redirect to zh:匙吻鱘, but pywikibot considers the former as an non-existent page without any information on the latter:

tools.yifeibot@tools-bastion-02:~/pywikibot$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)

<a list of files removed>

nothing added to commit but untracked files present (use "git add" to track)
tools.yifeibot@tools-bastion-02:~/pywikibot$ python pwb.py shell
Welcome to the Pywikibot interactive shell!
>>> import pywikibot
>>> pywikibot.Page(pywikibot.Site("zh", "wikipedia"), u"匙吻鲟").exists()
False
>>> pywikibot.Page(pywikibot.Site("zh", "wikipedia"), u"匙吻鱘").exists()
True

Event Timeline

zhuyifei1999 raised the priority of this task from to Needs Triage.
zhuyifei1999 updated the task description. (Show Details)
zhuyifei1999 moved this task from Backlog to Probably both (bugs only) on the Pywikibot-General board.
jayvdb triaged this task as High priority.Jun 6 2015, 9:21 AM

https://zh.wikipedia.org/w/index.php?title=%E5%8C%99%E5%90%BB%E9%B2%9F&redirect=no and https://zh.wikipedia.org/w/api.php?titles=%E5%8C%99%E5%90%BB%E9%B2%9F&action=query&prop=info both consider this to be a missing page - no mention that it is understood. However using it in a link creates a blue link, so somehow our Link & Page class must also correctly understand this, and it is a quite serious problem that it it fails.

As discussed on T57241, one solution is to use converttitles (added MW 1.17; T26296), and handle the converted titles array much like we handle the normalized titles array
https://zh.wikipedia.org/w/api.php?titles=%E5%8C%99%E5%90%BB%E9%B2%9F&action=query&prop=info&converttitles

It appears it can be used on other Wikimedia wikis without a error, so it could be a default prop on any site 1.17+
https://en.wikipedia.org/w/api.php?action=query&titles=Main_Page&converttitles

Just throwing this out there: an alternative (probably stupid) is to handle this client side, especially if there are libraries and rulesets which do identical translations, at least for the main languages. MediaWiki currently has 9 automatic language converters : gan, iu, kk, ku, shi, sr, tg, uz, zh (and variants of each), so .. lots of work to re-implement them all client side if they dont already exist in a way Python can use it (efficiently).