Page MenuHomePhabricator

Repetitive API userinfo queries
Closed, ResolvedPublic


Originally from:
Reported by: xqt
Created on: 2012-06-21 13:44:27
Subject: Rewrite Performance (multiple API request)
Original description:
There are multiple user info queries which slows down the performance:

c:\Pywikipedia\rw> user:xqt/Test -simulate -v
Pywikipediabot r10326 2012-06-08 12:08:53Z
Python 2.7.3 \(default, Apr 10 2012, 23:24:47\) \[MSC v.1500 64 bit \(AMD64\)\]
Retrieving 1 pages from wikipedia:de.
Starting 1 threads...
API action query: userinfo
Found 1 wikipedia:de processes running, including this one.

>>> Benutzer:Xqt/Test <<<
\- Test
\+ Test Test
Comment: Bot: Ändere ...
Do you want to accept these changes? \(\[y\]es, \[N\]o\) y
API action query: userinfo
API action query: userinfo
Cosmetic changes for wikipedia-de enabled.
API action query: siteinfo|userinfo
API action query: userinfo
API action edit:
SIMULATION: edit action blocked.
Page \[\[Benutzer:Xqt/Test\]\] saved without any changes.
Page \[\[Benutzer:Xqt/Test\]\] saved
Dropped throttle\(s\).
Waiting for threads to finish...
All threads finished.
Dropped throttle\(s\).


Version: core-(2.0)
Severity: normal
See Also:



Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:25 AM
bzimport set Reference to bz55192.
bzimport added a subscriber: Unknown Object (????).

These are muliple API requests and I guess a lot of them could be cached by a site instance or on disk. This and other code parts decreases the performance of pwb 2.0 by 30% \(or increases the process by 50%\) meassured with -start:\! -pt:0

  • assigned_to: russblau --> nobody
  • summary: Multiple user info request --> Rewrite Performance (multiple API request)

Im not sure how the code looked before about April 2014 .. so my comment are unrelated to how the code looked when this bug was raised in 2012.

Since at least 2014, userinfo is added to every query, and the response is used to determine whether the server has a different username than pywikibot expects.
This occurs in usual usage for two reasons:

  1. the bot starts logged out, but with the cookies sent, the server may reply with a username, in which case the server considers the bot logged in. So pywikibot changes the login status of the APISite accordingly.
  1. the server invalidates the bot's session, or maybe even credentials e.g. when we had a forced password reset.

So there are many API requests and responses with a small chunk of extra data. This could be removed/reduced, with a lot of pain, and little gain.

There are also many times where the code base sends the exact same userinfo+siteinfo request several times, because the login code is a mess. However, these are cached locally on disk - which is still a performance problem as this requires disk IO for a tiny chunk of data that the code has already parsed and discarded.

I fixed a few of these reload scenarios back in July/August, but it is not fun fiddling with the login/relogin sequence.

IMO we should wait until we've released a stable version of 2.0, and then redesign the user/login system, removing the two user system that is heavily embedded in the current codebase. That will probably require a breaking change for sysop-bots, but bot-bots should be unaffected.

Since at least 2014, userinfo is added to every query, and the response is used to determine whether the server has a different username than pywikibot expects.

The issue is not adding userinfo to each query, that's a perfectly sane design. The issue is that apart from that, that the code is sending out repetitive

requests, 'several times a second' (as reported by @Malafaya today on irc). This might indeed have something to do with the login code, but I'm not entirely sure (haven't checked).

Change 231769 had a related patch set uploaded (by Merlijn van Deen):
APISite: removed username check in getuserinfo

valhallasw claimed this task.