Page MenuHomePhabricator

Could not find local user data for {username}@{wikiId} (2025)
Open, Needs TriagePublicPRODUCTION ERROR

Description

Error
normalized_message
Could not find local user data for {username}@{wikiId}
FrameLocationCall
from/srv/mediawiki/php-1.44.0-wmf.13/extensions/CentralAuth/includes/User/CentralAuthUser.php(2885)
#0/srv/mediawiki/php-1.44.0-wmf.13/extensions/CentralAuth/includes/User/CentralAuthUser.php(2831)MediaWiki\Extension\CentralAuth\User\CentralAuthUser->localUserData(string)
#1/srv/mediawiki/php-1.44.0-wmf.13/extensions/CentralAuth/includes/Special/SpecialCentralAuth.php(246)MediaWiki\Extension\CentralAuth\User\CentralAuthUser->queryUnattached()
#2/srv/mediawiki/php-1.44.0-wmf.13/includes/specialpage/SpecialPage.php(729)MediaWiki\Extension\CentralAuth\Special\SpecialCentralAuth->execute(string)
#3/srv/mediawiki/php-1.44.0-wmf.13/includes/specialpage/SpecialPageFactory.php(1735)MediaWiki\SpecialPage\SpecialPage->run(string)
#4/srv/mediawiki/php-1.44.0-wmf.13/includes/actions/ActionEntryPoint.php(503)MediaWiki\SpecialPage\SpecialPageFactory->executePath(string, MediaWiki\Context\RequestContext)
#5/srv/mediawiki/php-1.44.0-wmf.13/includes/actions/ActionEntryPoint.php(145)MediaWiki\Actions\ActionEntryPoint->performRequest()
#6/srv/mediawiki/php-1.44.0-wmf.13/includes/MediaWikiEntryPoint.php(202)MediaWiki\Actions\ActionEntryPoint->execute()
#7/srv/mediawiki/php-1.44.0-wmf.13/index.php(58)MediaWiki\MediaWikiEntryPoint->run()
#8/srv/mediawiki/w/index.php(3)require(string)
#9{main}
Impact

Three users in a week. Affected users probably can't login.

Notes

See T119736: Could not find local user data for {Username}@{wiki} and its duplicates for past instances of this bug (all circa 2016).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

This seems to be happening a lot recently: https://logstash.wikimedia.org/goto/687a23ae924b1a76ac8da97c479dcec6, although the stack trace (see below) differs from the original one in the task description.

from /srv/mediawiki/php-1.46.0-wmf.3/extensions/CentralAuth/includes/User/CentralAuthUser.php(3049)
#0 /srv/mediawiki/php-1.46.0-wmf.3/extensions/CentralAuth/includes/User/CentralAuthUser.php(2904): MediaWiki\Extension\CentralAuth\User\CentralAuthUser->localUserData(string)
#1 /srv/mediawiki/php-1.46.0-wmf.3/extensions/CentralAuth/includes/User/CentralAuthUser.php(2873): MediaWiki\Extension\CentralAuth\User\CentralAuthUser->queryAttached()
#2 /srv/mediawiki/wmf-config/CommonSettings.php(2449): MediaWiki\Extension\CentralAuth\User\CentralAuthUser->getLocalGroups()
#3 /srv/mediawiki/wmf-config/CommonSettings.php(2483): wmfGetPrivilegedGroups(MediaWiki\User\User)
#4 /srv/mediawiki/php-1.46.0-wmf.3/includes/HookContainer/HookContainer.php(141): {closure}(array, array)
#5 /srv/mediawiki/php-1.46.0-wmf.3/includes/HookContainer/HookRunner.php(2239): MediaWiki\HookContainer\HookContainer->run(string, array, array)
#6 /srv/mediawiki/php-1.46.0-wmf.3/includes/Request/WebRequest.php(1570): MediaWiki\HookContainer\HookRunner->onGetSecurityLogContext(array, array)
#7 /srv/mediawiki/php-1.46.0-wmf.3/includes/Auth/AuthManager.php(2129): MediaWiki\Request\WebRequest->getSecurityLogContext(MediaWiki\User\User)
#8 /srv/mediawiki/php-1.46.0-wmf.3/includes/Setup.php(597): MediaWiki\Auth\AuthManager->autoCreateUser(MediaWiki\User\User, string, bool, bool, MediaWiki\User\User)
#9 /srv/mediawiki/php-1.46.0-wmf.3/includes/WebStart.php(72): require_once(string)
#10 /srv/mediawiki/php-1.46.0-wmf.3/api.php(23): require(string)
#11 /srv/mediawiki/w/api.php(3): require(string)
#12 {main}

Up from tens of times per day to thousands of times per day.

Screenshot Capture - 2025-11-22 - 17-14-03.png (492×1 px, 58 KB)

The pattern of increase is consistent with train deploys.
We should look into this.

The increase might just be the consequence of wmfGetPrivilegedGroups being called more often (see also T410878: wmfGetPrivilegedGroups is slow); need to check whether the timeline matches that.

Did some digging and found something probably useful. The increase aligns with the train deployment from Nov 18 (ref. https://www.mediawiki.org/wiki/MediaWiki_1.46/wmf.3). So I'm thinking the culprit is https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1202566 Cc @kostajh

On Nov 19, we started seeing the spikes (probably when the change hit group1 wikis)

Screenshot 2025-11-26 at 1.42.28 PM.png (426×1 px, 75 KB)

Screenshot 2025-11-26 at 1.42.00 PM.png (392×1 px, 39 KB)

AuthManager::autoCreateUser() invokes the GetSecurityLogContext hook which on this line will trigger a call to wmfGetPrivilegedGroups().

I'm not exactly sure how to move forward, but one idea is to narrow Kosta's patch to AuthManager::AUTOCREATE_SOURCE_TEMP sources instead of logging for every source?

Change #1211635 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] auth: Add security log context only for temporary accounts

https://gerrit.wikimedia.org/r/1211635

AuthManager::autoCreateUser() invokes the GetSecurityLogContext hook which on this line will trigger a call to wmfGetPrivilegedGroups().

The proposed patch would invoke the hook handler only for temporary accounts. That should reduce logspam a bit, even though the issue will still need to be solved at the root.

Assumptions

The stack trace in the task description is not an uncaught exception. The request completed unchanged and the end-user is not impacted. It is a "warning" message logged to the CentralAuth channel, with a diagnostic stack trace attached.

While first navigating to wiki and auto-creating an account on that wiki, we try to fetch data bout the local account, which fails because we're literally still creating that.

My first theory is that AuthManager might be changing some global state from "anonymous" to "username" too early, before the user table row is written. If true, that would be a code smell, because we should update global state after creation is done, not before. But I can understand if some legacy code during auto-creation might have required this in the past, and it could make sense that it would only show up now, because we didn't try to fetch that data at this point in time until the recently added log-context call.

I reject this theory, because the failures happens based on a $user parameter that is injected all the way down. MediaWiki core is correctly mimising exposure to this incomplete User object. (Almost immediatley after the affected Logger call, we do User::addToDatabase, so this is literally the last mile before such fetch woudl work fine.) — It makes sense then that this user is incomplete.

Other callers

There is one other place where we already call WebRequest::getSecurityLogContext during account creation, and it is SpecialCreateAccount.php. There, we pass $performer instead of the (unfinished) $user object. We should probably do that here as well. For the default log context that MediaWiki core adds, this would make no difference (client IP, UA header, etc). For the context added via wmf-config, it would be doing that for the anon User rather than the non-existent local user. That makes sense, but would mean it logs the IP instead of the username. This is an acceptable loss, because the log message that we're talking about, is literally saying the username that is being created, so we don't lose that either way.

Race condition

Looking at Logstash samples, the majority happen during auto-login start. For example:

Could not find local user data for {username}@{wikiId}
referer: https://ja.wikipedia.org/
url: https://wikimania.wikimedia.org/wiki/Special:CentralAutoLogin/start?…

This is most likely someone who just signed up for an account via ja.wikipedia.org, and is now loading their first post-login pageview on jawiki. During that first pageview, CentralAuth makes requests to a dozen or so wikis (wmgCentralAuthAutoLoginWikis) in order to plant a CentralAuth session cookie on the family domains: enwiktionary for *.wiktionary.org, enwikinews for *.wikinews.org, and then a handful of individual wikis like meta.wikimedia, commons.wikimedia, wikidata, and wikimania.wikimedia.org.

The error is on this request to wikimania.wikimedia.org, and that request is indeed auto-creating the local account on wikimaniawiki.

However, what's less obvious is that the "local user data" error is not about the local account to-be on wikimaniawiki or jawiki. It is about an entirely different wiki:

Could not find local user data for …@mediawikiwiki

referer: https://ja.wikipedia.org/
url: https://wikimania.wikimedia.org/wiki/Special:CentralAutoLogin/start?…
username: …
wikiid: mediawikiwiki

The stack trace says that CentralAuthUser->localUserData isn't called for the current user/wiki, it's called from a loop in CentralAuthUser->queryAttached, from wmfGetPrivilegedGroups > CentralAuthUser->getLocalGroups. That method is under no obligation to return data about any particular wiki. It simply says: Get a list of wikis where your account exists, and tell me what groups you're in. So any wiki where you don't have an account, that's fine. The caller isn't expecting anything about that list. It just wants everything that there is, but nothing specifically.

Hence, the caller catches LocalUserNotFoundException and tolerates it (albeit with the warning we see in the logs). The question then is:

  • If this is a brand new account on their return from the sign up form, how did it get an account on mediawiki.org?
  • Why is the account existing according to one source (s7.centralauth), but non-existing according to another (s3.wikimaniawiki.user)?

These auto-login start requests are dictated by this array:

/wmf-config/InitialiseSettings.php
'wmgCentralAuthAutoLoginWikis' = [
  '.wikipedia.org' => 'enwiki',
  '.wikinews.org' => 'enwikinews',
  '.wiktionary.org' => 'enwiktionary',
  
  '.mediawiki.org' => 'mediawikiwiki',
  
  'wikimania.wikimedia.org' => 'wikimaniawiki',
  

Even though the logs only record this eror about this last one, we know logically that this "initial" ja.wikipedia.org pageview didn't just call Special:CentralAutoLogin/start on wikimania.wikimedia.org. It called it on all of them. Something like the following would have happened, the CentralAutoLogin calls happening in parallel.

Imagined
* https://ja.wikipedia.org/wiki/Main_Page

** https://en.wikinews.org/wiki/Special:CentralAutoLogin/start?…
** https://en.wiktionary.org/wiki/Special:CentralAutoLogin/start?…
** https://www.mediawiki.org/wiki/Special:CentralAutoLogin/start?…
** https://wikimania.wikimedia.org/wiki/Special:CentralAutoLogin/start?…

If there is a race condition here, it is not surprising that it would be between two requests near each other. In this case, we saw an error from the wikimania request, in fetching local user data from mediawikiwiki.

So what is the race condition? Doesn't MediaWiki have atomic database transactions that protect against this sort of thing?

Well yes, but that works only within a given database. The centralauth database and the mediawiki.org database are independent from each other. It is perfectly natural for the wikimania request to start a database connection to centralauth and to mediawiki.org, and use a replica that is behind by a fraction of a second.

If all this is normal: Are all the logged errors like this, or are there others? The second most common scenario was this one, also a cross-wiki context but without an (obvious) autocreate or autologin action.

Could not find local user data for ~2025-36867-54@enwiki

referer: https://simple.wikipedia.org/
url: https://en.wikipedia.org/w/index.php?action=raw&ctype=text/javascript&title=MediaWiki:Gadget-ReferenceTooltips.js
username: ~2025-36867-54
wikiId: enwiki

This is essentially the same thing. This user just got their temp account created during an edit on simple.wikipedia.org. That post-edit pageview, just like the first post-login pageview, probably fires off numerous CentralAutoLogin requests to the same dozen wikis. But, that's not the only thing the browser will do. That "first" pageview is also still a normal pageview, and those sometimes include resources from another domain. In this case, simplewiki has a default-on gadget that loads a script from en.wikipedia.org.

That en.wikipedia.org request, just like any pageview would, will implicitly perform an auto-create. It can do this without the magic of Special:CentralAutoLogin because en.wikipedia and simple.wikipedia are under the same registered domain and so share a CentralAuth cookie. It essentially does the same that Special:CentralAutoLogin would. It may very well be racing with CentralAutoLogin where that one just created the account, but from simplewiki we see the central data but not yet the local wiki.

Lastly: Why did it regress? And can we just ignore this?

It regressed in the sense that it became more frequent, presumably because 1) we didn't used to call this method during auto-create and now we do, and 2) with temp user enabled on enwiki, we now have far more account creations that roll the dice on this race condition.

I don't think we should just remove this warning from the code because it is useful under any other circumstances. Generally speaking, if the central database and one of the local wiki databases are out of sync, that's a problem worth logging in production.

I would recommend moving or skipping the warning such that we don't log if the attachment info says it was very recently created. MediaWiki generally tolerates 10s of replag (per Rdbms, WANObjectCache, and wgCdnMaxageStale). I'd say round it up to a minute and call it a day. If there is missing data after such period, we'd want to know about in case something regresses in the future.

Thanks for your input @Krinkle.

Just adding that we've also been working on T408724: Clean up $performer parameter of AuthManager::autoCreateUser() for some time now (since part of your comment above touches it), which may affect this task as well.

The error is on this request to wikimania.wikimedia.org, and that request is indeed auto-creating the local account on wikimaniawiki.

That would be a major regression, we are not supposed to auto-create during edge login.
And in general we don't seem to - compare the number of account creations on enwiki vs wikimaniawiki. Although that's still way too much autocreation for wikimaniawiki. Maybe it's broken, but not in Chrome? I'd generally expect the opposite, Chrome is the most permissive about cross-site requests. Or maybe it only happens to same-site wikis? But the ones in your example aren't same-site...

That en.wikipedia.org request, just like any pageview would, will implicitly perform an auto-create. It can do this without the magic of Special:CentralAutoLogin because en.wikipedia and simple.wikipedia are under the same registered domain and so share a CentralAuth cookie. It essentially does the same that Special:CentralAutoLogin would. It may very well be racing with CentralAutoLogin where that one just created the account, but from simplewiki we see the central data but not yet the local wiki.

In theory it shouldn't be possible to have a race between edge login and a cookie-authenticated request, since edge login happens exactly on the domains where there is no shared cookie, and egde login needs to finish before the cookie is sent to the browser. Requests that race with it would simply be anonymous. (Also, edge login shouldn't autocreate.)

It regressed in the sense that it became more frequent, presumably because [...] 2) with temp user enabled on enwiki, we now have far more account creations that roll the dice on this race condition.

It does affect temp users, but they are something like 5-10% of total errors, so I don't think that's a major factor.

I would recommend moving or skipping the warning such that we don't log if the attachment info says it was very recently created.

That makes sense narrowly in the context of the warning, but 1) if edge login is autocreating accounts, we need to fix that, 2) I don't really see how we can have a race condition here.

We get this error because CentralAuthUser::queryAttachedBasic() (which queries the localnames table) says a given wiki has an attached account, but CentralAuthUser::localUserData() (which queries the user table) doesn't find it. The user table entry is created before the localnames entry (AuthManager calls User::addToDatabase() then calls CentralAuthPrimaryAuthenticationProvider::autoCreatedAccount() which attaches the user and adds the localname entry). CentralAuthUser::localUserData() tries the replica but falls back to the primary on a miss. CentralAuthUser::queryAttachedBasic() uses LoadBalancer::hasOrMadeRecentPrimaryChanges(), but that only works within the same request that made the write; and implicitly uses ChronologyProtector, but that's cookie-based and doesn't work for cross-wiki requests. So we get the list of wikis to query from the replica CentralAuth DB, and then the user table checks use the primary. That shouldn't ever fail due to a race condition.

The stack trace in the task description is not an uncaught exception. The request completed unchanged and the end-user is not impacted. It is a "warning" message logged to the CentralAuth channel, with a diagnostic stack trace attached.

Unfortunately this is not the case. CentralAuthUser::localUserData() logs this warning and then throws a LocalUserNotFoundException exception. queryAttached() catches the exception and unattaches the wiki. If the exception was bogus, the user loses access to that wiki and we end up with a dangling user account. (Apparently we just found out there's a bunch of those: T411116: CentralAuth's localuser table contains many nulls and duplicate mappings, so maybe that's somehow related, although unattching should delete the localuser row rather than set it to null, so it's not an obvious fit.)

@Krinkle's explanation makes sense to me. This bug probably happens when a request that attempts to autocreate an account races with another request that accesses information about each local user via queryAttached() (which could be another autocreate attempt, or it could be anything login-related that calls wmfGetPrivilegedGroups()).

The discussion about edge logins is confusing, and I don't think they are relevant (or at least are not necessary to cause the bug): here's an example: https://logstash.wikimedia.org/goto/1c9d306abe54ad60436b1a5fef7945f4 where the bug happens during some api.php requests from the Wikipedia mobile app that autocreate the user.

Dec 1, 2025 @ 18:47:05.421	crhwiki	Attaching local user XXX@crhwiki by 'login'
Dec 1, 2025 @ 18:47:06.433	wuuwiki	Could not find local user data for XXX@crhwiki

That looks like a race condition for sure.

CentralAuthUser::localUserData() tries the replica but falls back to the primary on a miss. (…) So we get the list of wikis to query from the replica CentralAuth DB, and then the user table checks use the primary. That shouldn't ever fail due to a race condition.

Possibly the primary doesn't have the data yet because the transaction on the wiki database hasn't been committed yet? Or it's some kind of transaction isolation / repeatable-read problem? Or some in-process cache in CentralAuth?

(BTW in T410878 we just added a cache to the code, but this did not seem to affect the error rate here)

Possibly the primary doesn't have the data yet because the transaction on the wiki database hasn't been committed yet?

I suppose that's possible; all connections are committed roughly at the same time, in MediaWikiEntryPoint::commitMainTransaction(), but maybe the commit commands or precommit callbacks are slow enough to still leave a significant gap. And something could force the CentralAuth read to use the primary (probably a previous CA write in the same or the preceding request), so the CA data is more up-to-date than the local user data.

CA is in s7 though, and LBFactoryMulti goes through the sections in order, so I think local writes to any wiki that's not wikidatawiki would be committed before the CA writes?

It is somewhat remarkable how rare this error is for s7 wikis, though: a single instance so far, and even that seems to have a different stack trace.

Or it's some kind of transaction isolation / repeatable-read problem?

You mean something like

  • crhwiki process starts transaction on crhwiki DB
  • crhwiki process starts transaction on centralauth DB
  • wuuwiki process queries crhwiki DB, gets pre-transaction repeatable read snapshot
  • crhwiki process commits transactions
  • (wuuwiki queryAttached() call starts)
  • wuuwiki process queries centralauth DB, gets post-transaction repeatable read snapshot
  • wuuwiki process queries crhwiki DB, gets pre-transaction repeatable read snapshot

? But that would require two different cross-wiki reads in the same process, I don't think that happens often.

Or some in-process cache in CentralAuth?

That's the wrong direction: this error should happen when we see up-to-date CentralAuth data but outdated local user data.
And there's no caching for the user rows.

(BTW in T410878 we just added a cache to the code, but this did not seem to affect the error rate here)

I guess because the cache is cleared on attaching, which is what triggers the race condition?

The crhwiki account of that user seems to be functioning normally BTW, so I was probably wrong when I said the account would get unattached, even though the code seems to be doing that.

Change #1213568 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/CentralAuth@master] CentralAuthUser: Add debugging information for T385310

https://gerrit.wikimedia.org/r/1213568

Or it's some kind of transaction isolation / repeatable-read problem?

You mean something like (…)? But that would require two different cross-wiki reads in the same process, I don't think that happens often.

Yes, thanks for writing it out.

And yes, that is a flaw with this idea. I don't know where this hypothetical other cross-wiki read happens. Still, it seems plausible to me that something somewhere could be doing it. We have so many cross-wiki features these days.

It seems low-effort and low-risk to verify this. If a patch doesn't help, at least we'll know for sure that this is not it. (I was about to write a very similar patch to yours.)

The crhwiki account of that user seems to be functioning normally BTW, so I was probably wrong when I said the account would get unattached, even though the code seems to be doing that.

I think you were right the first time, but there is a check in the job code that prevents it from being unattached if it exists when the job runs: https://gerrit.wikimedia.org/g/mediawiki/extensions/CentralAuth/+/d28dbfa6f5a502451619de74452e8c259a61b4b5/includes/User/CentralAuthUnattachUserJob.php#51

It is perfectly natural for the wikimania request to start a database connection to centralauth and to mediawiki.org, and use a replica that is behind by a fraction of a second.

Or it's some kind of transaction isolation / repeatable-read problem?

That's what I mean, yes. A cross-wiki connection that started before the commits in the other request. Before all of them, not just before one of them. That connection doesn't have to be established for the same reason (i.e. same CentralAuth code), and it doesn't have to be for the same wiki.

Just for any wiki on the same cluster. If the request opens a connection for one s3 wiki near the start (i.e. because the current wiki is on s3) and then later CentralAuth asks for data from another s3 wiki, it'll re-use that connection, and may be subject to repeatable-read.

CA is in s7 though, and LBFactoryMulti goes through the sections in order, so I think local writes to any wiki that's not wikidatawiki would be committed before the CA writes?

I don't think the order of commits influences the race much, since that's a very narrow window indeed. I think it'll be more about whether the affected wiki's cluster has a database replica that's slightly lagged (either physically, or due to an open connection). Of course by definition all replicas are lagged, so this isn't visible in metrics as a notably "lagged" replica per-se. We're talking pretty tight races, and it is within normal bounds for data to not show up immediately in that way.

If the CA data comes from a primary DB then that would make it much more likely indeed to face a gap between that and the replica.

[…] BTW, so I was probably wrong when I said the account would get unattached, even though the code seems to be doing that.

Nice catch. I hadn't noticed that when I wrote my comment.

Accounts don't get detached, because queueAdminUnattachJob queues a CentralAuthUnattachUserJob, which checks if there is a local account before detaching because "Races are fun!" (source).

The way this code is written is I think very much intentionally with the expectation of replication lag in mind. Whoever wrote this loop over centralauth.localuser anticipated and expected the local lookups to sometimes fail, wrote an exception for it that is gracefully handled, and handles races in both directions by ommitting it from the returned data to the generic/unspecific caller, and by not over-correcting in the opposite direction with the job.

Change #1213568 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] CentralAuthUser: Add debugging information for T385310

https://gerrit.wikimedia.org/r/1213568

Change #1214146 had a related patch set uploaded (by Bartosz Dziewoński; author: Gergő Tisza):

[mediawiki/extensions/CentralAuth@wmf/1.46.0-wmf.4] CentralAuthUser: Add debugging information for T385310

https://gerrit.wikimedia.org/r/1214146

Change #1214147 had a related patch set uploaded (by Bartosz Dziewoński; author: Gergő Tisza):

[mediawiki/extensions/CentralAuth@wmf/1.46.0-wmf.5] CentralAuthUser: Add debugging information for T385310

https://gerrit.wikimedia.org/r/1214147

Change #1214146 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@wmf/1.46.0-wmf.4] CentralAuthUser: Add debugging information for T385310

https://gerrit.wikimedia.org/r/1214146

Change #1214147 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@wmf/1.46.0-wmf.5] CentralAuthUser: Add debugging information for T385310

https://gerrit.wikimedia.org/r/1214147

Mentioned in SAL (#wikimedia-operations) [2025-12-02T22:01:48Z] <catrope@deploy2002> Started scap sync-world: Backport for [[gerrit:1214146|CentralAuthUser: Add debugging information for T385310 (T385310)]], [[gerrit:1214147|CentralAuthUser: Add debugging information for T385310 (T385310)]]

Mentioned in SAL (#wikimedia-operations) [2025-12-02T22:04:31Z] <catrope@deploy2002> catrope, matmarex: Backport for [[gerrit:1214146|CentralAuthUser: Add debugging information for T385310 (T385310)]], [[gerrit:1214147|CentralAuthUser: Add debugging information for T385310 (T385310)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-12-02T22:09:17Z] <catrope@deploy2002> Finished scap sync-world: Backport for [[gerrit:1214146|CentralAuthUser: Add debugging information for T385310 (T385310)]], [[gerrit:1214147|CentralAuthUser: Add debugging information for T385310 (T385310)]] (duration: 07m 29s)

First results of the debugging patch are in: T385310_seenWithLock and T385310_uses_primary_ca are always true. It looks like we should turn the debugging code into a real fix.

Is it actually required to have the information available at this point? From my previous analysis it seems the caller is indifferent about which wikis are included in the array. The requests could have completed in either order so the concurrent callers don't know about those wikis, and a good number of them won't be included either way if both the CA+local writes haven't taken place yet. It's only because one them happens to have finished already that we even try in the first place, but the caller didn't specifically expect or ask for that.

We should generally reduce cross-datacenter and primary DB connections on GET requests. I'd rather move in the opposite direction and think how we could remove the existing DB_PRIMARY read as well. Starting with a way to tolerate the existing race would be a good step towards that.

If there are callers to localUserData besides getLocalGroups that really do need to see their own writes, we could have a READ_LATEST flag to request DB_PRIMARY (instead of always falling back to DB_PRIMARY) will should suffice as that will reuse the write handle. That would be about own writes, not about unrelated writes by other requests. Anyway, that's for later.

Good point, it's probably not required, but it depends on the caller, and I haven't looked at them. For the code path starting at wmfGetPrivilegedGroups(), the information is not required (any account in the process of being created is not going to be in any interesting groups).

Change #1214718 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] Auth: Change debug context for autocreate from $user to $performer

https://gerrit.wikimedia.org/r/1214718

Change #1215307 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/CentralAuth@master] CentralAuthUser: Replace some uses of localUserData()

https://gerrit.wikimedia.org/r/1215307

Change #1215308 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/CentralAuth@master] CentralAuthUser: Add $recency param to queryAttached(), queryUnattached()

https://gerrit.wikimedia.org/r/1215308

I replaced a few unusual methods that called localUserData() unnecessarily, leaving just queryAttached() and queryUnattached() as the methods whose callers we need to review. There is a decent number of them: https://codesearch.wmcloud.org/search/?q=(queryAttached|queryUnattached)\b and I haven't finished reviewing them yet (help welcome).

The error is on this request to wikimania.wikimedia.org, and that request is indeed auto-creating the local account on wikimaniawiki.

That would be a major regression, we are not supposed to auto-create during edge login.
And in general we don't seem to - compare the number of account creations on enwiki vs wikimaniawiki. Although that's still way too much autocreation for wikimaniawiki. […]

@Tgr wrote in Gerrit:

I double-checked and (as expected) edge login isn't autocreating accounts.
At the very least, it's not new. T198940

The examples I mention are all based on real ones Derick and I found in the the Logstash data.

I understand Special:CentralAutoLogin does not intentionally do auto-creation (e.g. T387357), and 99% of signups don't end up auto-creating on wikis like wikimaniawiki. But some percentage do, and when investigating this race condition in Logstash, Special:CentralAutoLogin shows up a fair bit. In terms of understanding the race condition, it is as good an example as any, but we can ignore it if you prefer. It is not important to this task.

  • When visiting a new wiki, you do get auto-created during casual browsing if the new wiki is under the same family. For example, sign up on nl.wikipedia.org and then visit en.wikipedia.org, because there is a shared CentralAuth cookie. This is not impacted by third-party cookie seggregation/blocking.
  • When loading a gadget or other file from another domain, that can auto-create your account if the cross-domain request carries CentralAuth cookies. Again, this will be the case within a wiki family such as signing up on simple.wikipedia.org and then your first pageview on simplewiki may load a default-on CSS gadget from en.wikipedia.org, and auto-create you there.
  • When making cross-domain API requests, I believe the centralauthtoken lets you interact with any wiki, including those not visited before, and like any valid request with CentralAuth credentials (cookies or otherwise) it will auto-create, right?
  • And yes, Special:CentralAutoLogin will trigger it in some cirumstances when the cookies are available. Not because of any logic in that special page specifically, but just because it is a web request to another wiki, and thus runs the general WebStart/Setup.php logic in MediaWiki core. It would also happen if the request was for Special:Blankpage.

Logstash queries:

The four most common ones I found are:

  • After signup, cross-domain request to new family in the same family.
  • Old account, visiting a new wiki outside family without shared cookie (e.g. wikitech.wikimedia.org) where pageview does not auto-create you, but JavaScript request to Special:CentralAutoLogin/start routes via auth.wikimedia.org and may auto-create you during return leg to /wiki/Special:CentralAutoLogin/setCookies.
  • After signup in Wikipedia app for iOS, API requests to other Wikipedias auto-creates you.
  • After signup in Wikipedia app for Android, in-app browser performs login and the Special:CentralAutoLogin beacons auto-creates you.

Cross-domain request within family

  • es.wikipedia.org anon edit create temp user
  • (internal) jobrunner autocreates on loginwiki and metawiki
  • one second later, their pageview makes a cross-domain request to enwiki at /w/index.php?title=MediaWiki:Gadget-ReferenceTooltips.css&action=raw&ctype=text/css. This auto-creates.

Visit a new wiki without shared cookies

Example in Logstash

  • An account that is several years old logs in using Firefox on de.wikipedia.org.
  • A few minutes later, they visit wikitech.wikimedia.org.
    • This renders logged-out (no autocreate because no CentralAuth session, and Wikitech isn't in wgCentralAuthAutoLoginWikis).
    • ext.centralauth.centralautologin.js makes request to Special:CentralAutoLogin/start via auth.wikimedia.org, finds central session, and during the return leg this auto-creates you on /wiki/Special:CentralAutoLogin/setCookies.

The close proximity to a recent login might be why auth.wikimedia.org state is still visible to ext.centralauth.centralautologin.js across domains. Or maybe the person in question used a less strict privacy setting than the default in Firefox.

Wikipedia app for iOS

From what I can see, these are all via the API, but uses a shared cookiejar and so the CentralAuth session on .wikipedia.org is visible to all, just like in a browser.

Example
Dec 4, 2025 @ 20:28
wiki: enwiki
message: Could not find local user data for <REDACTED>@enwiki
url: https://en.wikipedia.org/w/api.php?action=query&format=json&meta=userinfo&uiprop=options
stack trace:
  Setup.php > AuthManager->autoCreateUser > onGetSecurityLogContext > wmfGetPrivilegedGroups > …

The example I looked at, had this on Special:CentralAuth:

  • 20:28 - fr.wikipedia.org - new account
  • 20:28 - es.wikipedia.org - created on login
  • 20:28 - en.wikipedia.org - created on login
  • 20:28 - login.wikimedia.org - created on login
  • 20:28 - meta.wikimedia.org - created on login
  • 20:28 - ja.wikipedia.org - created on login
  • 20:29 - www.wikidata.org - created on login
  • 20:29 - commons.wikimedia.org - created on login

Request trail, from searching their username on the general "mediawiki" Logstash dashboard.

  1. e008da5e-6e54-4023-8ef0-d739676ba5fa fr.wikipedia POST /w/api.php
    • continueAccountCreation: Account creation succeeded
    • (internal mw-jobrunner) login.wikimedia POST /rpc/RunSingleJob.php
      • CentralAuthCreateLocalAccountJob autoCreateUser: creating new user
    • (internal mw-jobrunner) meta.wikimedia POST /rpc/RunSingleJob.php
      • CentralAuthCreateLocalAccountJob autoCreateUser: creating new user
  2. 563c14d5-0fcc-434e-be4c-257fd87d27b5 es.wikipedia GET /w/api.php?action=query&format=json&meta=userinfo&uiprop=options
    • Setup.php autoCreateUser: creating new user
  3. b2ae4eac-0011-4aff-8442-59ad3032b4a9 en.wikipedia GET /w/rest.php/readinglists/…
    • Setup.php autoCreateUser: creating new user
  4. 2afdb209-d9a1-428b-97a5-8bdc14729358 es.wikipedia GET /w/api.php?meta=notifications…
    • Setup.autoCreateUser: creating new user
    • autoCreateUser: <REDACTED> already exists locally (race)
  5. 96867ff7-e432-431f-ba36-5f26594cadb7 en.wikipedia.org GET /w/api.php?meta=notifications&…
    • Setup.php/autoCreateUser/wmfGetPrivilegedGroups Could not find local user data for <REDACTED>@enwiki
    • autoCreateUser: creating new user <REDACTED>
    • autoCreateUser: <REDACTED> already exists locally (race)

This is a race between request 3 and 5. I found many more all of which used the WikipediaApp. It can just as well happen in a web browser, but it's easy to see how the app makes the race more likely by contacting multiple wikis concurrently.

Wikipedia app on Android

It seems the Android app is an example where Special:CentralAutoLogin works maximally to all wikis in $wgCentralAuthAutoLoginWikis. Requests below carried the user agent WikipediaApp/2.7.50550-r-2025-09-22 (Android 10; Phone; <REDACTED>) Google Play.

The below example is from one username, not multiple. It hits the local user data race condition 8 times during a single signup.

Example
Dec 4, 2025 @ 21:50 | enwikisource | Could not find local user data for <REDACTED>@enwikibooks
Dec 4, 2025 @ 21:50 | enwikiversity | Could not find local user data for <REDACTED>@enwikisource
Dec 4, 2025 @ 21:50 | enwikinews | Could not find local user data for <REDACTED>@enwikibooks
Dec 4, 2025 @ 21:50 | enwikinews | Could not find local user data for <REDACTED>@enwikisource
Dec 4, 2025 @ 21:50 | enwikinews | Could not find local user data for <REDACTED>@enwikiversity
Dec 4, 2025 @ 21:50 | enwikiquote | Could not find local user data for <REDACTED>@enwiktionary
Dec 4, 2025 @ 21:50 | incubatorwiki | Could not find local user data for <REDACTED>@specieswiki
Dec 4, 2025 @ 21:50 | incubatorwiki | Could not find local user data for <REDACTED>@wikimaniawiki

Special:CentralAuth for their account:

  • 21:46 en.wikipedia.org - new account
  • 21:46 login.wikimedia.org - created on login
  • 21:46 meta.wikimedia.org - created on login
  • 21:50 commons.wikimedia.org - created on login
  • 21:50 www.wikidata.org - created on login
  • 21:50 en.wikibooks.org - created on login
  • 21:50 en.wikisource.org - created on login
  • 21:50 en.wikiversity.org - created on login
  • 21:50 en.wikivoyage.org - created on login
  • 21:50 api.wikimedia.org - created on login
  • 21:50 en.wikinews.org - created on login
  • 21:50 en.wiktionary.org - created on login
  • 21:50 foundation.wikimedia.org - created on login
  • 21:50 www.mediawiki.org - created on login
  • 21:50 en.wikiquote.org - created on login
  • 21:50 incubator.wikimedia.org - created on login
  • 21:50 species.wikimedia.org - created on login
  • 21:50 wikimania.wikimedia.org - created on login
  • 21:50 www.wikifunctions.org - created on login

Request trail:

  1. 21:46 mw-api-ext POST enwiki /w/api.php?action=createaccount: Creating user {user} during account creation
  2. 21:46 mw-jobrunner POST loginwiki | autoCreateUser
  3. 21:36 mw-api-ext POST enwiki /w/api.php?action=clientlogin: Login for {user} succeeded from {clientIp}
  4. 21:46 mw-jobrunner POST metawiki: autoCreateUser
  5. 21:50 mw-api-int wikidatawiki /w/api.php?centralauthtoken=…&meta=notifications: autoCreateUser: creating new user
    • User agent WikipediaApp/… (Android 10; …) Google Play (via ForeignWikiRequest MediaWiki/1.46.0-wmf.5)
  6. 21:50 mw-api-int commonswiki /w/api.php?centralauthtoken=…&meta=notifications: autoCreateUser: creating new user
    • User agent WikipediaApp/… (Android 10; …) Google Play (via ForeignWikiRequest MediaWiki/1.46.0-wmf.5)
  7. 21:50 mw-web enwikibooks /wiki/Special:CentralAutoLogin/start?from=enwiki&type=1x1&useformat=mobile&usesul3=1 autoCreateUser: creating new user
    • User agent WikipediaApp/… (Android 10; …) Google Play (same as request 1-4 again)
    • Referer https://en.wikipedia.org/
  8. 21:50 mw-web enwikisource /wiki/Special:CentralAutoLogin/start?from=enwiki&type=1x1&useformat=mobile&usesul3=1: Could not find local user data for <REDACTED>@enwikibooks
  9. etc

Change #1214718 merged by jenkins-bot:

[mediawiki/core@master] Auth: Change debug context for autocreate from $user to $performer

https://gerrit.wikimedia.org/r/1214718

Change #1211635 abandoned by D3r1ck01:

[mediawiki/core@master] auth: Add security log context only for temporary accounts

Reason:

Abandoning this for now. Seems like it won't be needed. Will restore otherwise.

https://gerrit.wikimedia.org/r/1211635

Change #1215307 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] CentralAuthUser: Replace some uses of localUserData()

https://gerrit.wikimedia.org/r/1215307