Page MenuHomePhabricator

SpecialConfirmEmail causes "MWException: CAS update failed on user_touched" from User.php
Closed, ResolvedPublicPRODUCTION ERROR

Description

The following exceptionl has been causing an increase of 500 errors in frwiki:

[XQ-aPApAIC0AAKHQOu4AAACG] /wiki/Sp%C3%A9cial:Confirmer_l%27adresse_de_contact/db7426ac65515abc5c57fa5d665e03c7   MWException from line 4114 of /srv/mediawiki/php-1.34.0-wmf.8/includes/user/User.php: CAS update failed on user_touched. The version of the user to be saved is older than the current version.

Event Timeline

Thanks for creating the task :)

Some more details:

exception.file	       	/srv/mediawiki/php-1.34.0-wmf.8/includes/user/User.php:4114
url	       	/wiki/Special:ChangeEmail
exception.trace	       	#0 /srv/mediawiki/php-1.34.0-wmf.8/includes/libs/rdbms/database/Database.php(3831): Closure$User::saveSettings(Wikimedia\Rdbms\DatabaseMysqli, string)
#1 /srv/mediawiki/php-1.34.0-wmf.8/includes/user/User.php(4125): Wikimedia\Rdbms\Database->doAtomicSection(string, Closure$User::saveSettings;359)
#2 /srv/mediawiki/php-1.34.0-wmf.8/includes/specials/SpecialChangeEmail.php(191): User->saveSettings()
#3 /srv/mediawiki/php-1.34.0-wmf.8/includes/specials/SpecialChangeEmail.php(127): SpecialChangeEmail->attemptChange(User, string)
#4 /srv/mediawiki/php-1.34.0-wmf.8/includes/htmlform/HTMLForm.php(675): SpecialChangeEmail->onSubmit(array, OOUIHTMLForm)
#5 /srv/mediawiki/php-1.34.0-wmf.8/includes/htmlform/HTMLForm.php(567): HTMLForm->trySubmit()
#6 /srv/mediawiki/php-1.34.0-wmf.8/includes/htmlform/HTMLForm.php(582): HTMLForm->tryAuthorizedSubmit()
#7 /srv/mediawiki/php-1.34.0-wmf.8/includes/specialpage/FormSpecialPage.php(184): HTMLForm->show()
#8 /srv/mediawiki/php-1.34.0-wmf.8/includes/specials/SpecialChangeEmail.php(61): FormSpecialPage->execute(NULL)
#9 /srv/mediawiki/php-1.34.0-wmf.8/includes/specialpage/SpecialPage.php(570): SpecialChangeEmail->execute(NULL)
#10 /srv/mediawiki/php-1.34.0-wmf.8/includes/specialpage/SpecialPageFactory.php(575): SpecialPage->run(NULL)
#11 /srv/mediawiki/php-1.34.0-wmf.8/includes/MediaWiki.php(288): MediaWiki\Special\SpecialPageFactory->executePath(Title, RequestContext)
#12 /srv/mediawiki/php-1.34.0-wmf.8/includes/MediaWiki.php(865): MediaWiki->performRequest()
#13 /srv/mediawiki/php-1.34.0-wmf.8/includes/MediaWiki.php(515): MediaWiki->main()
#14 /srv/mediawiki/php-1.34.0-wmf.8/index.php(42): MediaWiki->run()
#15 /srv/mediawiki/w/index.php(3): include(string)
#16 {main}

See (non-public) {T226353} for a corresponding activity which triggered this.

zeljkofilipin renamed this task from (frwiki) User.php: CAS update failed on user_touched. The version of the user to be saved is older than the current version to User.php: CAS update failed on user_touched. The version of the user to be saved is older than the current version .Aug 21 2019, 3:06 PM
mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:06 PM
Krinkle renamed this task from User.php: CAS update failed on user_touched. The version of the user to be saved is older than the current version to SpecialChangeEmail causes "MWException: CAS update failed on user_touched" from User.php.Oct 17 2019, 6:20 PM

This instance of "CAS update failed" from SpecialChangeEmail currently means that during train deployments we will not catch new instances of this error from anywhere else in core or extensions because it is currently filtered out on the mediawiki-new-errors Logstash dashboard and there isn't a feasible way to filter it out just for this code path with the limitations of Kibana and the amount of time we can justify maintainers of that dashboard spend on creating such filters.

Woud be nice if this could be addressed soon and/or it doesn't matter to somehow suppress it from the exception/error logs in favour of something dedicated to it that a specific team can look after on their own dashboard instead.

WDoranWMF added a subscriber: WDoranWMF.

@Krinkle I'm moving this to our icebox, I'm concerned we won't get to it soon and want to be honest. We're working through fixing out backlog and will rereview icebox after that though feel free to push back.

Change 553520 had a related patch set uploaded (by Umherirrender; owner: Umherirrender):
[mediawiki/core@master] Use User::getInstanceForUpdate on SpecialChangeEmail

https://gerrit.wikimedia.org/r/553520

Change 553520 merged by jenkins-bot:
[mediawiki/core@master] Use User::getInstanceForUpdate on SpecialChangeEmail

https://gerrit.wikimedia.org/r/553520

Change 553520 merged by jenkins-bot:
[mediawiki/core@master] Use User::getInstanceForUpdate on SpecialChangeEmail

https://gerrit.wikimedia.org/r/553520

Change
+ $userLatest = $user->getInstanceForUpdate();

- $user->saveSettings();
+ $userLatest->saveSettings();

Looks like this might not have worked as intended. The exception is still seen in production (although that could be because of a different race condition). And, more important, it says it was read from a replica. Whereas getInstanceForUpdate is presumably meant to read from a master.

https://logstash.wikimedia.org/goto/3933869bfe9190d78869432ddb62ddd8

reqId XfjT4ApAMFEAAAgbZ2gAAACH
host=mw1246 channel=exception wiki=enwiki
  /wiki/Special:ConfirmEmail/%%%%%%%%%%
  MWException: CAS update failed on user_touched. The version of the user to be saved is older than the current version.


host=mw1246 channel=preferences wiki=enwiki
   CAS update failed on user_touched for user ID '%%%%%%%' (replica read)

Resetting for re-triage. The underlying code might be broken in ways that would have wider implications.

These errors seem to be for Special:ConfirmEmail but the patch was for Special:ChangeEmail.

Krinkle renamed this task from SpecialChangeEmail causes "MWException: CAS update failed on user_touched" from User.php to SpecialConfirmEmail causes "MWException: CAS update failed on user_touched" from User.php.Jan 6 2020, 10:17 PM

Ah my bad, so that patch was actually fixing a different special page. I've updated the task title to correctly reflect the trace reported in this task and re-tagging Clinic Duty to investigate/patch the CAS token acquisition being too early / racing.

Krinkle moved this task from Icebox to Inbox on the Platform Engineering board.

(Move to CPT inbox. My previous action moved it to the Clinic Duty Inbox which I now understand isn't the main inbox.)

Anomie added a subscriber: Anomie.

I don't see any instances of this error from SpecialChangeEmail in the past 60 days, so I'm going to close this. If it starts happening again, someone can reopen or re-file.

brennen added a subscriber: brennen.

Seeing a bunch of these today in 1.36.0-wmf.3:

Screenshot-2020-08-07-11:31:05.png (180×1 px, 13 KB)

Logstash: mediawiki-new-errors filtered for User.php:3429.

Change 619021 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Use User::getInstanceForUpdate to update user on SpecialConfirmEmail

https://gerrit.wikimedia.org/r/619021

Not sure what caused this to start happening again, but the patch above should fix it.

daniel triaged this task as High priority.Sep 9 2020, 3:17 PM

Change 619021 merged by jenkins-bot:
[mediawiki/core@master] Use User::getInstanceForUpdate to update user on SpecialConfirmEmail

https://gerrit.wikimedia.org/r/619021

Please checks log, if this is still an issue or if the merged patch set helps. Thanks.