Official WMF Phabricator work account. @xSavitar is my volunteer account. Use that for non-WMF related things.
User Details
- User Since
- Jan 7 2020, 11:30 AM (308 w, 6 d)
- Availability
- Available
- IRC Nick
- xSavitar
- LDAP User
- Unknown
- MediaWiki User
- DAlangi (WMF) [ Global Accounts ]
Today
@HideonRosie, an attempted fix was deployed. Could you check to see if the issue is resolved on your side? Thanks!
Fri, Dec 5
I observed that during testing locally, when the user is logged in locally and logged out centrally, visiting Special:CreateAccount or Special:UserLogin from the local wiki redirects to Special:UserLogin regardless.
Oops, I missed this during code review and local testing of 56126683a56. Fixed now:
Thu, Dec 4
Potentially related: T411799: E-mail doesn't get confirmed and gets unlinked constantly
Another trace coming from SessionBackend:
File T411804: [SpecialConfirmEmail] RuntimeException: CAS update failed on user_touched. The version of the user to be saved is older than the current version. separately. The trace is different and I think it should be easy to fix.
After deploying the backport of I49fcec68427b70f (f674fc54418b9b2de5a) and monitoring the logs, the issue is no longer occurring after 14:35:30 (last occurrence): https://logstash.wikimedia.org/goto/0343934111dcac6d80b17f749494aa1b
Wed, Dec 3
Logs after deployment shows:
@matmarex, this one: T411654: MediaWiki periodic job startupregistrystats failed just showed up on our board (not long ago), and I left a comment there.
Per https://logstash.wikimedia.org/goto/d830f3d25c48f8fa7e38e8bd8ae17895, this appears to be a service-mesh issue, as the logs suggest.
We've had several past instances of issues like:
- T409935: MediaWiki periodic job startupregistrystats-mediawikiwiki failed
- T390971: MediaWikiCronJobFailed
- T391574: MediaWikiCronJobFailed
- T389182: startupregistrystats-testwiki fails to run on php-1.44.0-wmf.21
- T371354: startupregistrystats-testwiki maintenance job is failing
- T346800: startupregistrystats-testwiki periodic job fails
- T404809: MediaWiki periodic job startupregistrystats failed
- T404730: MediaWiki periodic job startupregistrystats failed
- T409212: MediaWiki periodic job startupregistrystats-testwiki failed
Thanks for landing the final piece @matmarex. This can be resolved now!
Tue, Dec 2
I've monitored Logstash since the deployment, but it hasn't gone down. These lines up accurately with https://logstash.wikimedia.org/goto/71373cec585746d63094e22d911053b4 (CAS update failed on user_touched for user ID '{user_id}' ({db_flag} read)), which are all coming from replica DBs.
Mon, Dec 1
Affects https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1213109 as well. 2 times in a row.
Fri, Nov 28
Just adding that we've also been working on T408724: Clean up $performer parameter of AuthManager::autoCreateUser() for some time now, which may affect this task as well.
Wed, Nov 26
Looking at the flame graph, I wanted to understand why CentralAuth::localUserData() triggers so many SQB::fetchRow() calls, and quickly realized this was influenced by edge-login (to other wikis). So, for a given request (after a user successfully logs in on the shared domain and the local wiki for that request), edge-login occurs in the same request (for all wikis in LocalDatabases / parent-domains), and I was able to observe this locally as well.
AuthManager::autoCreateUser() invokes the GetSecurityLogContext hook which on this line will trigger a call to wmfGetPrivilegedGroups().
Did some digging and found something probably useful. The increase aligns with the train deployment on Nov 18 (ref. https://www.mediawiki.org/wiki/MediaWiki_1.46/wmf.3). So I'm thinking the culprit is https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1202566 Cc @kostajh
Tue, Nov 25
Mon, Nov 24
Sat, Nov 22
This seems to be happening a lot recently: https://logstash.wikimedia.org/goto/687a23ae924b1a76ac8da97c479dcec6, although the stack trace (see below) differs from the original one in the task description.
Fri, Nov 21
@brennen, seems like this is also similar to T383050: CAS update failed on gu_cas_token for user ID '{globalId}' (read from {from}); the version of the user to be saved is older than the current version. that was declined? Cc @Tgr
Thanks @Tgr for merging.
Thu, Nov 20
Reflects https://github.com/wikimedia/less.php/issues/134. Thanks for filing, @Reedy.
Wed, Nov 19
Tue, Nov 18
@Nikerabbit, let me know if the issue persists for you. Thanks!
I suspected that. So it seems like there are more consumers of the SessionManager in this way, in addition to PHPSessionHandler::read(). We also have PHPSessionHandler::write() and PHPSessionHandler::destroy(). I'll centralize the logic for getting a new session manager object so that callers can use it. Thanks for posting the trace.
Mon, Nov 17
This was added in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/1128044. Also, note that credentials change may require security re-authentication (and PasswordReset could be one of them). So I'm not sure removing it from central special pages would work nicely.
Nov 5 2025
All patches merged and tests fixed. Thanks, @Krinkle, for working on this.
Nov 4 2025
Ah, apologies about that @Tgr, I was supposed to remove that migration code after the migration. Made a fix, thanks for filing the issue.
Per CodeSearch search, luckily not so many places in WMF deployed repos: https://codesearch.wmcloud.org/deployed/?q=%3EautoCreateUser%5C%28
Oct 31 2025
Oct 30 2025
Backports were deployed in https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251030T2000, but something interesting happened, and I filed T408868: Log deployment activities on auto-detected changes during deployment.
Found T408540: PHP Deprecated: Asking for a replica from groups except dump/vslow is deprecated: watchlist [Called from Wikimedia\Rdbms\LoadBalancer::getConnectionInternal] and it looks like that has been resolved.
Oct 24 2025
Oct 23 2025
Oct 22 2025
Oct 21 2025
As suspected, the onSaveUserOptions() (ref) hook handler in BetaFeatures doesn't short-circuit on anonymous and temporary users. Examining new logs with user ID and username, I can already identify requests with user IDs of 0 and usernames of false or an IP address.
I've looked at various request IDs on Logstash, and the stacktraces all look the same. Was hoping I'll see a different stacktrace, but I've not spotted a single one different from the one below:
Oct 20 2025
Oct 19 2025
Oct 17 2025
Closing this as resolved. Everything looks stable after the deployment, and we haven't had any reports of issues.