@DAlangi_WMF Thanks for the backport!
- Feed Queries
- All Stories
- Search
- Feed Search
- Transactions
- Transaction Logs
Yesterday
Thu, Dec 11
Mon, Dec 8
@HideonRosie, an attempted fix was deployed. Could you check to see if the issue is resolved on your side? Thanks!
Fri, Dec 5
I observed that during testing locally, when the user is logged in locally and logged out centrally, visiting Special:CreateAccount or Special:UserLogin from the local wiki redirects to Special:UserLogin regardless.
Oops, I missed this during code review and local testing of 56126683a56. Fixed now:
Thu, Dec 4
Potentially related: T411799: E-mail doesn't get confirmed and gets unlinked constantly
Another trace coming from SessionBackend:
File T411804: [SpecialConfirmEmail] RuntimeException: CAS update failed on user_touched. The version of the user to be saved is older than the current version. separately. The trace is different and I think it should be easy to fix.
After deploying the backport of I49fcec68427b70f (f674fc54418b9b2de5a) and monitoring the logs, the issue is no longer occurring after 14:35:30 (last occurrence): https://logstash.wikimedia.org/goto/0343934111dcac6d80b17f749494aa1b
In T410652#11431387, @Tgr wrote:Hm. I thought AuthManager makes sure it is a primary user but I must have misread the code because for login it acutally doesn't. I suppose there isn't really a strong reason to, most authentication providers wouldn't change user data on login.
Wed, Dec 3
Logs after deployment shows:
@matmarex, this one: T411654: MediaWiki periodic job startupregistrystats failed just showed up on our board (not long ago), and I left a comment there.
Per https://logstash.wikimedia.org/goto/d830f3d25c48f8fa7e38e8bd8ae17895, this appears to be a service-mesh issue, as the logs suggest.
We've had several past instances of issues like:
- T409935: MediaWiki periodic job startupregistrystats-mediawikiwiki failed
- T390971: MediaWikiCronJobFailed
- T391574: MediaWikiCronJobFailed
- T389182: startupregistrystats-testwiki fails to run on php-1.44.0-wmf.21
- T371354: startupregistrystats-testwiki maintenance job is failing
- T346800: startupregistrystats-testwiki periodic job fails
- T404809: MediaWiki periodic job startupregistrystats failed
- T404730: MediaWiki periodic job startupregistrystats failed
- T409212: MediaWiki periodic job startupregistrystats-testwiki failed
In T410878#11421350, @matmarex wrote:We've just added a cache, so it's still slow the first time, then fast after. I think this is good enough? It should minimize the impact of using this data in any additional logging.
Thanks for landing the final piece @matmarex. This can be resolved now!
Tue, Dec 2
I've monitored Logstash since the deployment, but it hasn't gone down. These lines up accurately with https://logstash.wikimedia.org/goto/71373cec585746d63094e22d911053b4 (CAS update failed on user_touched for user ID '{user_id}' ({db_flag} read)), which are all coming from replica DBs.
Mon, Dec 1
Affects https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1213109 as well. 2 times in a row.
Fri, Nov 28
Just adding that we've also been working on T408724: Clean up $performer parameter of AuthManager::autoCreateUser() for some time now, which may affect this task as well.
Wed, Nov 26
Looking at the flame graph, I wanted to understand why CentralAuth::localUserData() triggers so many SQB::fetchRow() calls, and quickly realized this was influenced by edge-login (to other wikis). So, for a given request (after a user successfully logs in on the shared domain and the local wiki for that request), edge-login occurs in the same request (for all wikis in LocalDatabases / parent-domains), and I was able to observe this locally as well.
AuthManager::autoCreateUser() invokes the GetSecurityLogContext hook which on this line will trigger a call to wmfGetPrivilegedGroups().
Did some digging and found something probably useful. The increase aligns with the train deployment on Nov 18 (ref. https://www.mediawiki.org/wiki/MediaWiki_1.46/wmf.3). So I'm thinking the culprit is https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1202566 Cc @kostajh
Tue, Nov 25
In T405450#11404260, @Nikerabbit wrote:Seems to work now. Thanks!
Mon, Nov 24
Sat, Nov 22
This seems to be happening a lot recently: https://logstash.wikimedia.org/goto/687a23ae924b1a76ac8da97c479dcec6, although the stack trace (see below) differs from the original one in the task description.
Fri, Nov 21
@brennen, seems like this is also similar to T383050: CAS update failed on gu_cas_token for user ID '{globalId}' (read from {from}); the version of the user to be saved is older than the current version. that was declined? Cc @Tgr
Thanks @Tgr for merging.
Thu, Nov 20
Reflects https://github.com/wikimedia/less.php/issues/134. Thanks for filing, @Reedy.
Wed, Nov 19
Tue, Nov 18
@Nikerabbit, let me know if the issue persists for you. Thanks!
I suspected that. So it seems like there are more consumers of the SessionManager in this way, in addition to PHPSessionHandler::read(). We also have PHPSessionHandler::write() and PHPSessionHandler::destroy(). I'll centralize the logic for getting a new session manager object so that callers can use it. Thanks for posting the trace.
Mon, Nov 17
In T405450#11367581, @Nikerabbit wrote:Reported too early, unfortunately the issue persists.
In T409984#11379109, @Xaosflux wrote:Special:PasswordReset shouldn't require reauthentication, as it is available unauthenticated
In T405231#11230633, @Tgr wrote:I forgot to rename a bunch of getInstanceForUpdate mocks in EmailNotificationSecondaryAuthenticationProviderTest, and it's passing anyway, so maybe it's not actually testing what it's supposed to.
This was added in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/1128044. Also, note that credentials change may require security re-authentication (and PasswordReset could be one of them). So I'm not sure removing it from central special pages would work nicely.
In T406566#11252626, @Tgr wrote:
Nov 5 2025
All patches merged and tests fixed. Thanks, @Krinkle, for working on this.
Nov 4 2025
Ah, apologies about that @Tgr, I was supposed to remove that migration code after the migration. Made a fix, thanks for filing the issue.
Per CodeSearch search, luckily not so many places in WMF deployed repos: https://codesearch.wmcloud.org/deployed/?q=%3EautoCreateUser%5C%28
Oct 31 2025
Oct 30 2025
Backports were deployed in https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251030T2000, but something interesting happened, and I filed T408868: Log deployment activities on auto-detected changes during deployment.
Found T408540: PHP Deprecated: Asking for a replica from groups except dump/vslow is deprecated: watchlist [Called from Wikimedia\Rdbms\LoadBalancer::getConnectionInternal] and it looks like that has been resolved.
Oct 24 2025
In T408221#11307202, @Dzahn wrote:used disk space was 100% and is now 37%
Oct 23 2025
Oct 22 2025
Oct 21 2025
As suspected, the onSaveUserOptions() (ref) hook handler in BetaFeatures doesn't short-circuit on anonymous and temporary users. Examining new logs with user ID and username, I can already identify requests with user IDs of 0 and usernames of false or an IP address.
I've looked at various request IDs on Logstash, and the stacktraces all look the same. Was hoping I'll see a different stacktrace, but I've not spotted a single one different from the one below:
Oct 20 2025
Oct 19 2025
Oct 17 2025
Closing this as resolved. Everything looks stable after the deployment, and we haven't had any reports of issues.