Page MenuHomePhabricator

Wikimedia\NormalizedException\NormalizedException: User ID: {centralUserId} mismatch with {storedUserId} for user: {username}
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   RuntimeException: User ID mismatch
FrameLocationCall
from/srv/mediawiki/php-1.44.0-wmf.19/extensions/CentralAuth/includes/CentralAuthRedirectingPrimaryAuthenticationProvider.php(193)
#0/srv/mediawiki/php-1.44.0-wmf.19/includes/auth/AuthManager.php(638)MediaWiki\Extension\CentralAuth\CentralAuthRedirectingPrimaryAuthenticationProvider->continuePrimaryAuthentication(array)
#1/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/AuthManagerSpecialPage.php(401)MediaWiki\Auth\AuthManager->continueAuthentication(array)
#2/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/AuthManagerSpecialPage.php(533)MediaWiki\SpecialPage\AuthManagerSpecialPage->performAuthenticationStep(string, array)
#3/srv/mediawiki/php-1.44.0-wmf.19/includes/htmlform/HTMLForm.php(824)MediaWiki\SpecialPage\AuthManagerSpecialPage->handleFormSubmit(array, MediaWiki\HTMLForm\CodexHTMLForm)
#4/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/AuthManagerSpecialPage.php(464)MediaWiki\HTMLForm\HTMLForm->trySubmit()
#5/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/LoginSignupSpecialPage.php(404)MediaWiki\SpecialPage\AuthManagerSpecialPage->trySubmit()
#6/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/SpecialPage.php(729)MediaWiki\SpecialPage\LoginSignupSpecialPage->execute(null)
#7/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/SpecialPageFactory.php(1737)MediaWiki\SpecialPage\SpecialPage->run(null)
#8/srv/mediawiki/php-1.44.0-wmf.19/includes/actions/ActionEntryPoint.php(503)MediaWiki\SpecialPage\SpecialPageFactory->executePath(string, MediaWiki\Context\RequestContext)
#9/srv/mediawiki/php-1.44.0-wmf.19/includes/actions/ActionEntryPoint.php(145)MediaWiki\Actions\ActionEntryPoint->performRequest()
#10/srv/mediawiki/php-1.44.0-wmf.19/includes/MediaWikiEntryPoint.php(202)MediaWiki\Actions\ActionEntryPoint->execute()
#11/srv/mediawiki/php-1.44.0-wmf.19/index.php(58)MediaWiki\MediaWikiEntryPoint->run()
#12/srv/mediawiki/w/index.php(3)require(string)
#13{main}
Impact

~10 people per day. Probably breaks autologin attempt and next autologin will succeed, so minimal.

Notes

This is some variant of T380500: CentralAuthUser returning outdated data after user creation.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Tgr subscribed.

Only a single instance (logstash). Do we seriously not log the user name/ID when this happens? Ugh.
It's very much not supposed to happen, but without any specifics, can't really investigate.

Change #1125389 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/extensions/CentralAuth@master] Add user IDs that mismatch in the runtime exception

https://gerrit.wikimedia.org/r/1125389

Only a single instance (logstash). Do we seriously not log the user name/ID when this happens? Ugh.
It's very much not supposed to happen, but without any specifics, can't really investigate.

Add some logging and more info to the runtime exception is thrown.

Maybe one of the ids is zero and this is some version of T380500: CentralAuthUser returning outdated data after user creation.

Yes, I highly suspect we're hitting that. But let's find try to prove ourselves with some logging :)

Change #1125389 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] Log and add user IDs that mismatch in the runtime exception

https://gerrit.wikimedia.org/r/1125389

Only a single instance (logstash).

Error
normalized_message
[{reqId}] {exception_url}   RuntimeException: User ID mismatch
FrameLocationCall
from/srv/mediawiki/php-1.44.0-wmf.19/extensions/CentralAuth/includes/CentralAuthRedirectingPrimaryAuthenticationProvider.php(193)
#0/srv/mediawiki/php-1.44.0-wmf.19/includes/auth/AuthManager.php(638)MediaWiki\Extension\CentralAuth\CentralAuthRedirectingPrimaryAuthenticationProvider->continuePrimaryAuthentication(array)
#1/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/AuthManagerSpecialPage.php(401)MediaWiki\Auth\AuthManager->continueAuthentication(array)
#2/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/AuthManagerSpecialPage.php(533)MediaWiki\SpecialPage\AuthManagerSpecialPage->performAuthenticationStep(string, array)
#3/srv/mediawiki/php-1.44.0-wmf.19/includes/htmlform/HTMLForm.php(824)MediaWiki\SpecialPage\AuthManagerSpecialPage->handleFormSubmit(array, MediaWiki\HTMLForm\CodexHTMLForm)
#4/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/AuthManagerSpecialPage.php(464)MediaWiki\HTMLForm\HTMLForm->trySubmit()
#5/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/LoginSignupSpecialPage.php(404)MediaWiki\SpecialPage\AuthManagerSpecialPage->trySubmit()
#6/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/SpecialPage.php(729)MediaWiki\SpecialPage\LoginSignupSpecialPage->execute(null)
#7/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/SpecialPageFactory.php(1737)MediaWiki\SpecialPage\SpecialPage->run(null)
#8/srv/mediawiki/php-1.44.0-wmf.19/includes/actions/ActionEntryPoint.php(503)MediaWiki\SpecialPage\SpecialPageFactory->executePath(string, MediaWiki\Context\RequestContext)
#9/srv/mediawiki/php-1.44.0-wmf.19/includes/actions/ActionEntryPoint.php(145)MediaWiki\Actions\ActionEntryPoint->performRequest()
#10/srv/mediawiki/php-1.44.0-wmf.19/includes/MediaWikiEntryPoint.php(202)MediaWiki\Actions\ActionEntryPoint->execute()
#11/srv/mediawiki/php-1.44.0-wmf.19/index.php(58)MediaWiki\MediaWikiEntryPoint->run()
#12/srv/mediawiki/w/index.php(3)require(string)
#13{main}
Impact
Notes

Thanks @Reedy, we've added some logging, let's see if we can get more info about this as the patch goes out.

Change #1126051 had a related patch set uploaded (by Gergő Tisza; author: Derick Alangi):

[mediawiki/extensions/CentralAuth@wmf/1.44.0-wmf.19] Log and add user IDs that mismatch in the runtime exception

https://gerrit.wikimedia.org/r/1126051

Change #1126051 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@wmf/1.44.0-wmf.19] Log and add user IDs that mismatch in the runtime exception

https://gerrit.wikimedia.org/r/1126051

Mentioned in SAL (#wikimedia-operations) [2025-03-10T14:49:53Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1126050|SpecialCentralAutoLogin: Handle nullable wiki ID (T388252)]], [[gerrit:1126049|SUL3: Attach SUL mode to the return URL of local wiki (T388067)]], [[gerrit:1126051|Log and add user IDs that mismatch in the runtime exception (T388177)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-10T14:58:12Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1126050|SpecialCentralAutoLogin: Handle nullable wiki ID (T388252)]], [[gerrit:1126049|SUL3: Attach SUL mode to the return URL of local wiki (T388067)]], [[gerrit:1126051|Log and add user IDs that mismatch in the runtime exception (T388177)]] (duration: 15m 48s)

Both in practice seem to be against user id 0

Error
normalized_message
[{reqId}] {exception_url}   Wikimedia\NormalizedException\NormalizedException: User ID: {centralUserId} mismatch with {storedUserId} for user: {username}
FrameLocationCall
from/srv/mediawiki/php-1.44.0-wmf.19/extensions/CentralAuth/includes/CentralAuthRedirectingPrimaryAuthenticationProvider.php(199)
#0/srv/mediawiki/php-1.44.0-wmf.19/includes/auth/AuthManager.php(638)MediaWiki\Extension\CentralAuth\CentralAuthRedirectingPrimaryAuthenticationProvider->continuePrimaryAuthentication(array)
#1/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/AuthManagerSpecialPage.php(401)MediaWiki\Auth\AuthManager->continueAuthentication(array)
#2/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/AuthManagerSpecialPage.php(533)MediaWiki\SpecialPage\AuthManagerSpecialPage->performAuthenticationStep(string, array)
#3/srv/mediawiki/php-1.44.0-wmf.19/includes/htmlform/HTMLForm.php(824)MediaWiki\SpecialPage\AuthManagerSpecialPage->handleFormSubmit(array, MediaWiki\HTMLForm\CodexHTMLForm)
#4/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/AuthManagerSpecialPage.php(464)MediaWiki\HTMLForm\HTMLForm->trySubmit()
#5/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/LoginSignupSpecialPage.php(404)MediaWiki\SpecialPage\AuthManagerSpecialPage->trySubmit()
#6/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/SpecialPage.php(729)MediaWiki\SpecialPage\LoginSignupSpecialPage->execute(null)
#7/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/SpecialPageFactory.php(1737)MediaWiki\SpecialPage\SpecialPage->run(null)
#8/srv/mediawiki/php-1.44.0-wmf.19/includes/actions/ActionEntryPoint.php(503)MediaWiki\SpecialPage\SpecialPageFactory->executePath(string, MediaWiki\Context\RequestContext)
#9/srv/mediawiki/php-1.44.0-wmf.19/includes/actions/ActionEntryPoint.php(145)MediaWiki\Actions\ActionEntryPoint->performRequest()
#10/srv/mediawiki/php-1.44.0-wmf.19/includes/MediaWikiEntryPoint.php(202)MediaWiki\Actions\ActionEntryPoint->execute()
#11/srv/mediawiki/php-1.44.0-wmf.19/index.php(58)MediaWiki\MediaWikiEntryPoint->run()
#12/srv/mediawiki/w/index.php(3)require(string)
#13{main}
Impact
Notes
Error
normalized_message
[{reqId}] {exception_url}   Wikimedia\NormalizedException\NormalizedException: User ID: {centralUserId} mismatch with {storedUserId} for user: {username}
FrameLocationCall
from/srv/mediawiki/php-1.44.0-wmf.19/extensions/CentralAuth/includes/CentralAuthRedirectingPrimaryAuthenticationProvider.php(199)
#0/srv/mediawiki/php-1.44.0-wmf.19/includes/auth/AuthManager.php(638)MediaWiki\Extension\CentralAuth\CentralAuthRedirectingPrimaryAuthenticationProvider->continuePrimaryAuthentication(array)
#1/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/AuthManagerSpecialPage.php(401)MediaWiki\Auth\AuthManager->continueAuthentication(array)
#2/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/AuthManagerSpecialPage.php(533)MediaWiki\SpecialPage\AuthManagerSpecialPage->performAuthenticationStep(string, array)
#3/srv/mediawiki/php-1.44.0-wmf.19/includes/htmlform/HTMLForm.php(824)MediaWiki\SpecialPage\AuthManagerSpecialPage->handleFormSubmit(array, MediaWiki\HTMLForm\CodexHTMLForm)
#4/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/AuthManagerSpecialPage.php(464)MediaWiki\HTMLForm\HTMLForm->trySubmit()
#5/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/LoginSignupSpecialPage.php(404)MediaWiki\SpecialPage\AuthManagerSpecialPage->trySubmit()
#6/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/SpecialPage.php(729)MediaWiki\SpecialPage\LoginSignupSpecialPage->execute(null)
#7/srv/mediawiki/php-1.44.0-wmf.19/includes/specialpage/SpecialPageFactory.php(1737)MediaWiki\SpecialPage\SpecialPage->run(null)
#8/srv/mediawiki/php-1.44.0-wmf.19/includes/actions/ActionEntryPoint.php(503)MediaWiki\SpecialPage\SpecialPageFactory->executePath(string, MediaWiki\Context\RequestContext)
#9/srv/mediawiki/php-1.44.0-wmf.19/includes/actions/ActionEntryPoint.php(145)MediaWiki\Actions\ActionEntryPoint->performRequest()
#10/srv/mediawiki/php-1.44.0-wmf.19/includes/MediaWikiEntryPoint.php(202)MediaWiki\Actions\ActionEntryPoint->execute()
#11/srv/mediawiki/php-1.44.0-wmf.19/index.php(58)MediaWiki\MediaWikiEntryPoint->run()
#12/srv/mediawiki/w/index.php(3)require(string)
#13{main}
Impact
Notes

Sorry, had too many open windows.

30 examples in the last 7 days. The central used ID is always 0 (i.e. this is indeed caused by T380500: CentralAuthUser returning outdated data after user creation).

This always seems to be happening on the local continueAuthentication() call, presumably right after account creation.

Tgr renamed this task from RuntimeException: User ID mismatch to Wikimedia\NormalizedException\NormalizedException: User ID: {centralUserId} mismatch with {storedUserId} for user: {username}.Mar 20 2025, 1:20 PM

Change #1132609 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/extensions/CentralAuth@master] SUL3: Fix user ID mismatch during login (immediately after creation)

https://gerrit.wikimedia.org/r/1132609

This always seems to be happening on the local continueAuthentication() call, presumably right after account creation.

continuePrimaryAuthentication() calls CentralAuthUser::getInstanceByName() which feels slightly off as that reads from a replica (which might be affected even by the slightest lags). I was expecting that since we're just from an account creation and trying to use that in login in, we should really be reading the latest data immediately.

I've made a patch to use ::getPrimaryInstanceByName() instead to see if that will potentially resolve the issue in this case.

[…]

continuePrimaryAuthentication() calls CentralAuthUser::getInstanceByName() which feels slightly off as that reads from a replica (which might be affected even by the slightest lags). […]

In theory that can't happen because of ChronologyProtector, which ensures that the replicas we use have caught up to our own previous writes, given that it will be the same browser that created the account and the same browser getting redirected to read the data.

If this turns out incorrect, we need to file an Rdbms bug and confirm whether there are any missing CP-cookies or CP-query params in this journey.

In theory that can't happen because of ChronologyProtector, which ensures that the replicas we use have caught up to our own previous writes, given that it will be the same browser that created the account and the same browser getting redirected to read the data.

If this turns out incorrect, we need to file an Rdbms bug and confirm whether there are any missing CP-cookies or CP-query params in this journey.

That means something is possibly wrong here. Testing locally (without a replica DB - just primary) and observing what happens with the code execution, tells me a different story.

The continuePrimaryAuthentication() method is called during login and signups once either action succeeds on the shared domain and authentication continues on the local domain.


Scenario #1

When logging in, with a previously created account, CentralAuthUser fetch by instance name (getInstanceByName()) constructs a central auth user object and doesn't set the mFromPrimary member to true which is expected because the user already exists and we're just reading data from replica DB.

Scenario #2

When creating a new user, the central auth user fetch by instance name (getInstanceByName()) constructs a central auth user object and doesn't set the mFromPrimary member to true (as above) which is not expected because we should really be reading from the primary DB here which I think this is where CP should come into play.

I realized that the CP cookies (like cpPosIndex) are set on the shared domain during login but it looks like they don't have any effect when performing the login.

Scenario #3

When creating a new user, and using (with the proposed patch applied) central auth user fetch by primary instance name (getPrimaryInstanceByName()), the mFromPrimary is set to true which will ensure that the data is read immediately from the primary DB. So with the patch, the issue will be resolved (hopefully).


But from your comment above, it looks like there are 2 solutions to this problem: (1) resolving it at the CentralAuth layer or (2) resolving it at the Rdbms layer.

I like (2) as it feels like that is the right place to do it. And if we go by that, then that might potentially make CentralAuthUser::getInstanceByName() and CentralAuthUser::getPrimaryInstanceByName() obsolete as the DB will just know what to do in the correct place because the code looks like it's trying to resolve this exact issue of when/where to read/write based on recency.

If it's not a lot of work to do at the Rdbms layer, we can head start with this patch proposal and then when the Rdbms one finally lands, we can deprecate the code in CA in favor of the Rdbms one. Let me know what you think.

NOTE: (observed) The UseDC CP cookie is never set for me on the local domain (during signups) which looks like the cookie that tells CP which DB instance to read from (primary or replica). It's set only on the shared domain but authentication finishes on the local domain (hence continue authentication). I'm aware the CP cookies lasts only like 10 seconds which I think it's good enough time for the authentication chain to have completed already.

See (logs snippet) of the Special:UserLogin/return&wpLoginToken... request which is what triggers continue authentication on the local wiki - from debug log

Start request GET /w/index.php?title=Special:UserLogin/return&wpLoginToken=...&usesul3=1&centralauthLoginToken=...
IP: <redacted>
HTTP HEADERS:
COOKIE: XDEBUG_SESSION=PHPSTORM; enwikimwclientpreferences=vector-feature-limited-width-clientpref-1; enwikiUserName=CpTestUser2; CentralAuthAnonTopLevel=1; enwiki_session=...
ACCEPT-LANGUAGE: en-US,en;q=0.9
ACCEPT-ENCODING: gzip, deflate, br, zstd
REFERER: http://auth.mediawiki.mwdd.localhost:8082/

Notice that the CP cookies are not sent to the local domain after the redirect.

Reading the class level docblock of the ChronologyProtector, it states the following (extract):

*   If you have legacy extensions deployed that perform queries on the primary database during
*   GET requests, then you will have to identify a way to route any of its relevant URLs to the
*   primary DC as well, or to accept that their reads do not enjoy chronology protection, and
*   that writes may be slower (due to cross-dc latency).

It seems like this is intentional at least from CP's point of view how it works today? (at least if I'm understanding it correctly?) and that may explain why we have the getInstanceByName(), getPrimaryInstanceByName() methods in CentralAuthUser class?

The redirect to the local domain is a GET request that should be querying the primary DC (in this case) but neither the UseDC or cpPosIndex cookies is not sent on redirect from the shared domain to the local domain.

Change #1132609 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] SUL3: Fix user ID mismatch during login (immediately after creation)

https://gerrit.wikimedia.org/r/1132609

Change #1133500 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/extensions/CentralAuth@wmf/1.44.0-wmf.22] SUL3: Fix user ID mismatch during login (immediately after creation)

https://gerrit.wikimedia.org/r/1133500

Change #1133502 had a related patch set uploaded (by Bartosz Dziewoński; author: Derick Alangi):

[mediawiki/extensions/CentralAuth@wmf/1.44.0-wmf.23] SUL3: Fix user ID mismatch during login (immediately after creation)

https://gerrit.wikimedia.org/r/1133502

Change #1133500 abandoned by D3r1ck01:

[mediawiki/extensions/CentralAuth@wmf/1.44.0-wmf.22] SUL3: Fix user ID mismatch during login (immediately after creation)

Reason:

Should be on .23 instead.

https://gerrit.wikimedia.org/r/1133500

Change #1133500 restored by D3r1ck01:

[mediawiki/extensions/CentralAuth@wmf/1.44.0-wmf.22] SUL3: Fix user ID mismatch during login (immediately after creation)

https://gerrit.wikimedia.org/r/1133500

Change #1133500 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@wmf/1.44.0-wmf.22] SUL3: Fix user ID mismatch during login (immediately after creation)

https://gerrit.wikimedia.org/r/1133500

Change #1133502 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@wmf/1.44.0-wmf.23] SUL3: Fix user ID mismatch during login (immediately after creation)

https://gerrit.wikimedia.org/r/1133502

Mentioned in SAL (#wikimedia-operations) [2025-04-02T20:14:25Z] <reedy@deploy1003> Started scap sync-world: Backport for [[gerrit:1133500|SUL3: Fix user ID mismatch during login (immediately after creation) (T388177)]], [[gerrit:1133502|SUL3: Fix user ID mismatch during login (immediately after creation) (T388177)]], [[gerrit:1133503|Remove redundant WaitConditionLoop from CentralAuthTokenManager]], [[gerrit:1133504|Remove redundant WaitConditionLoop from CentralAuthTokenManager]]

Mentioned in SAL (#wikimedia-operations) [2025-04-02T20:47:51Z] <reedy@deploy1003> Started scap sync-world: Backport for [[gerrit:1133500|SUL3: Fix user ID mismatch during login (immediately after creation) (T388177)]], [[gerrit:1133502|SUL3: Fix user ID mismatch during login (immediately after creation) (T388177)]], [[gerrit:1133503|Remove redundant WaitConditionLoop from CentralAuthTokenManager]], [[gerrit:1133504|Remove redundant WaitConditionLoop from CentralAuthTokenManager]]

Mentioned in SAL (#wikimedia-operations) [2025-04-02T20:52:29Z] <reedy@deploy1003> d3r1ck01, matmarex, reedy: Backport for [[gerrit:1133500|SUL3: Fix user ID mismatch during login (immediately after creation) (T388177)]], [[gerrit:1133502|SUL3: Fix user ID mismatch during login (immediately after creation) (T388177)]], [[gerrit:1133503|Remove redundant WaitConditionLoop from CentralAuthTokenManager]], [[gerrit:1133504|Remove redundant WaitConditionLoop from CentralAuthTokenManager]

Mentioned in SAL (#wikimedia-operations) [2025-04-02T21:07:55Z] <reedy@deploy1003> Finished scap sync-world: Backport for [[gerrit:1133500|SUL3: Fix user ID mismatch during login (immediately after creation) (T388177)]], [[gerrit:1133502|SUL3: Fix user ID mismatch during login (immediately after creation) (T388177)]], [[gerrit:1133503|Remove redundant WaitConditionLoop from CentralAuthTokenManager]], [[gerrit:1133504|Remove redundant WaitConditionLoop from CentralAuthTokenManager]]

We should wait a few days to see if that really resolves the errors: https://logstash.wikimedia.org/goto/69ae850bf878fc526f85273c8547494f

Not seeing it today: https://logstash.wikimedia.org/app/dashboards#/view/AXFV7JE83bOlOASGccsT?_g=h@f7fb346&_a=h@09557da so far, is a good sign as we usually have a few of these every day.

But we few more days is a better cadence for conclusion so let's wait till end of week and if we no longer see them, we can resolve the task. I'll check again tomorrow evening :)

But we few more days is a better cadence for conclusion so let's wait till end of week and if we no longer see them, we can resolve the task. I'll check again tomorrow evening :)

Checking back on this again today (search time: Apr 2, 2025 @ 15:45:00.000 till now) shows no occurrences of this error. I'll do a final confirmation on Monday and then close this issue.

The last time the issue happened per logstash is April 2 @ 15:44:56.