Page MenuHomePhabricator

page.botMayEdit() causes requests to all pages in interwiki map
Closed, ResolvedPublic

Description

And because not all family files work correctly (T111608), this breaks editing (because page.save() calls .botMayEdit())

Minimal test case:

valhallasw@maeglin:pywikibot-core$ python pwb.py shell -v -v                                                                                                   master 0
The base directory is /home/valhallasw/src/pywikibot-core
Pywikibot r64b2f078f92a4a8820889c386308cc9b100eeac6
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2]
Welcome to the Pywikibot interactive shell!
>>> import pywikibot
>>> s = pywikibot.Site('ca', 'wikibooks')
>>> p = pywikibot.Page(s, u'Plantilla:Progrés')
>>> p.botMayEdit()
Found 2 wikibooks:ca processes running, including this one.
Found candidate wikibooks:af
Found candidate wikibooks:ang
Found candidate wikibooks:ar
Found candidate wikibooks:az
Found candidate wikibooks:en
Found candidate battlestarwiki:en
Found candidate wikibooks:be
(...etc...)

This was caused by changes in 0e7ad5ac44ce7e1df758bf9924c14fa3fe70ad9f, probably by creating a Site object in the new from_url function.

Event Timeline

valhallasw raised the priority of this task from to Unbreak Now!.
valhallasw updated the task description. (Show Details)
valhallasw added a project: Pywikibot.
valhallasw added subscribers: valhallasw, XZise.

Change 236351 had a related patch set uploaded (by XZise):
[FIX] Site: Prevent creating of unnecessary Sites

https://gerrit.wikimedia.org/r/236351

In order to properly identify the family and code corresponding to a URL it also needs to check the path as multiple MediaWiki instances may be on the same domain. To do that it can either compare the path to index.php which is known as it's the same as the path to api.php (granted it may not be actually the same, I haven't checked and Family allows two different paths, but then we need to adjust Family.from_url). And that path is already known so that is fine. But most wikis on the interwiki map actually use the article path (e.g. en.wikipedia.org/wiki/$1) which is in Family.nicepath but due to T89451 we try to minimize the number of variables necessary and thus my patch is using APISite.article_path which does a request to the wiki to get the article path.

Now in it's own that is fine as you probably want a Site instance when you call Family.from_url. But unfortunately the APISite._cache_interwiki method creates a Site instance for every entry to actually cache it.

Change 236351 merged by jenkins-bot:
[FIX] Site: Prevent creation of unnecessary Sites

https://gerrit.wikimedia.org/r/236351

jayvdb subscribed.