In a flame graph of a random login attempt, wmfGetPrivilegedGroups() (eventually, CentralAuthUser::queryAttached()) takes almost half a second. This is already bad for logins, but T395204: MediaWiki should log request information (IP, user agent, referrer, HTTP method, etc) in a more uniform and predictable way is making use of wmfGetPrivilegedGroups() much more frequently. Either we should rethink that, or we should speed it up.
Description
Details
Related Objects
- Mentioned In
- T415588: Add rate limit class for accounts that are in a local bot group on any wiki
T407219: Enrich MediaWiki logs with IP reputation data in Logstash
T385310: Could not find local user data for {username}@{wikiId} (2025)
T408917: Flame graphs don't seem to be collected on auth.wikimedia.org - Mentioned Here
- T401701: UserInfoCard: Queries performed by `CentralAuthUser::getBlocks` is uncached and performs lots of queries
T395204: MediaWiki should log request information (IP, user agent, referrer, HTTP method, etc) in a more uniform and predictable way
Event Timeline
Looking at the flame graph, I wanted to understand why CentralAuth::localUserData() triggers so many SQB::fetchRow() calls, and quickly realized this was influenced by edge-login (to other wikis). So, for a given request (after a user successfully logs in on the shared domain and the local wiki for that request), edge-login occurs in the same request (for all parent-domains), and I was able to observe this locally as well.
This is what I think is taking some time.
It's not even the edge login – queryAttached() calling localUserData() does a database query for each database where the user has a local account. So if you have 984 accounts, it does 984 queries. (That's why my logins have felt so slow.)
Optimizing this would be a significant effort, we'd need to keep a copy of the data in the central database (like we do for edit counts).
Do we actually need to fetch this information from every wiki during login? Why not just the wiki where the user is logging in?
Oh, because when you're logged in to any wiki, you can perform actions on every wiki through the API with centralauthtoken.
I suppose we could just cache the local groups. There is a hook that allows us to purge that cache when they are changed.
Change #1211748 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):
[mediawiki/extensions/CentralAuth@master] CentralAuthUser: Cache getLocalGroups()
Not just during login, it's used for various things. A user account is as valuable as its highest privilege (an interface admin can be targeted as a stepping stone for XSS etc), and due to the global nature of logins, that privilege can be anywhere. If someone is intadmin on enwiki and the current wiki is rowiki, we still want to be alerted about a suspicious spike in failed logins, make sure they have stricter policies when using Special:ChangePassword, enforce mandatory-2FA policies, log the creation of bot passwords etc. So yeah we'll need to fix this.
Change #1211748 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] CentralAuthUser: Cache getLocalGroups()
Change #1213437 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):
[mediawiki/extensions/CentralAuth@wmf/1.46.0-wmf.4] CentralAuthUser: Cache getLocalGroups()
Change #1213437 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@wmf/1.46.0-wmf.4] CentralAuthUser: Cache getLocalGroups()
Mentioned in SAL (#wikimedia-operations) [2025-12-01T14:31:40Z] <lucaswerkmeister-wmde@deploy2002> Started scap sync-world: Backport for [[gerrit:1213437|CentralAuthUser: Cache getLocalGroups() (T410878)]]
Mentioned in SAL (#wikimedia-operations) [2025-12-01T14:33:32Z] <lucaswerkmeister-wmde@deploy2002> lucaswerkmeister-wmde, matmarex: Backport for [[gerrit:1213437|CentralAuthUser: Cache getLocalGroups() (T410878)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
Mentioned in SAL (#wikimedia-operations) [2025-12-01T14:46:32Z] <lucaswerkmeister-wmde@deploy2002> Finished scap sync-world: Backport for [[gerrit:1213437|CentralAuthUser: Cache getLocalGroups() (T410878)]] (duration: 14m 51s)
Profiled after the change: https://performance.wikimedia.org/excimer/profile/298342f6fd3fd712
wmfGetPrivilegedGroups is now an appropriately tiny sliver (most of the time, when it's cached):
Compared to the previous profile: https://performance.wikimedia.org/excimer/profile/3a2a986048c0239d
We've just added a cache, so it's still slow the first time, then fast after. I think this is good enough? It should minimize the impact of using this data in any additional logging.
Doing it better would require at least some database changes (to hold a copy of the data in the CentralAuth database, kind of like the global_edit_count table that is just a combination of edit counts from each wiki), which seems like a lot of effort.
IMHO, I think this is good enough. To add, I think it's fine as we already do something like this in CentralAuthUser::getBlocks() (ref. T401701) as an optimization.
Doing it better would require at least some database changes (to hold a copy of the data in the CentralAuth database, kind of like the global_edit_count table that is just a combination of edit counts from each wiki), which seems like a lot of effort.
Maybe we can file this as a task (nice to have) and track it as something to do for the future. Potentially essential work area.
Change #1214644 had a related patch set uploaded (by Reedy; author: Bartosz Dziewoński):
[mediawiki/extensions/CentralAuth@REL1_45] CentralAuthUser: Cache getLocalGroups()
Change #1214645 had a related patch set uploaded (by Reedy; author: Bartosz Dziewoński):
[mediawiki/extensions/CentralAuth@REL1_44] CentralAuthUser: Cache getLocalGroups()
Change #1214646 had a related patch set uploaded (by Reedy; author: Bartosz Dziewoński):
[mediawiki/extensions/CentralAuth@REL1_43] CentralAuthUser: Cache getLocalGroups()
CentralAuthUser::getBlocks() isn't called in relatively performance-sensitive situations but CentralAuthUser::getLocalGroups() is.
I agree that further optimizations would take much more effort and we shouldn't try them unless we see evidence of performance problems, but I'm not sure where we should look for that evidence (especially over time as the security log context gets used more). Is there a way to get something like call frequency and average wall time of wmfGetPrivilegedGroups() calls from tideways data, or do we have to add profiling for that? If we do have to, I think it's worth doing (not necessarily in that function; maybe in getSecurityLogContext() or in getLocalGroups()).
Change #1214644 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@REL1_45] CentralAuthUser: Cache getLocalGroups()
Change #1214646 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@REL1_43] CentralAuthUser: Cache getLocalGroups()
Change #1214645 merged by Bartosz Dziewoński:
[mediawiki/extensions/CentralAuth@REL1_44] CentralAuthUser: Cache getLocalGroups()

