Page MenuHomePhabricator

MassMessage not working for delivery to Suwikiquote or Mlwikiquote: "CAS update failed on user_touched"
Open, HighPublicPRODUCTION ERROR

Description

An expected delivery to their village pumps was not made. I tried to reproduce at a sandbox, and confirmed.

To reproduce:

  1. {{target | user = Quiddity (WMF)/sandbox | site = su.wikiquote.org }}
  2. {{target | user = Quiddity (WMF)/sandbox | site = ml.wikiquote.org }}
  • Nothing arrives.

Impact

Message delivery is failing due to a system user conflict.

Event Timeline

Hmm.

2017-07-18 22:55:57 [WW6R2wpAME4AAGJWDeEAAABN] mw1299 mlwikiquote 1.30.0-wmf.9 exception ERROR: [WW6R2wpAME4AAGJWDeEAAABN] [no req]   MWException from line 4057 of /srv/mediawiki/php-1.30.0-wmf.9/includes/user/User.php: CAS update failed on user_touched for user ID '2110' (read from replica); the version of the user to be saved is older than the current version. {"exception_id":"WW6R2wpAME4AAGJWDeEAAABN","caught_by":"mwe_handler"} 
[Exception MWException] (/srv/mediawiki/php-1.30.0-wmf.9/includes/user/User.php:4057) CAS update failed on user_touched for user ID '2110' (read from replica); the version of the user to be saved is older than the current version.
  #0 /srv/mediawiki/php-1.30.0-wmf.9/includes/user/User.php(737): User->saveSettings()
  #1 /srv/mediawiki/php-1.30.0-wmf.9/extensions/MassMessage/includes/MassMessage.php(45): User::newSystemUser(string, array)
  #2 /srv/mediawiki/php-1.30.0-wmf.9/extensions/MassMessage/includes/job/MassMessageJob.php(64): MassMessage::getMessengerUser()
  #3 /srv/mediawiki/php-1.30.0-wmf.9/extensions/MassMessage/includes/job/MassMessageJob.php(289): MassMessageJob->getUser()
  #4 /srv/mediawiki/php-1.30.0-wmf.9/extensions/MassMessage/includes/job/MassMessageJob.php(208): MassMessageJob->makeAPIRequest(array)
  #5 /srv/mediawiki/php-1.30.0-wmf.9/extensions/MassMessage/includes/job/MassMessageJob.php(186): MassMessageJob->editPage()
  #6 /srv/mediawiki/php-1.30.0-wmf.9/extensions/MassMessage/includes/job/MassMessageJob.php(41): MassMessageJob->sendMessage()
  #7 /srv/mediawiki/php-1.30.0-wmf.9/includes/jobqueue/JobRunner.php(293): MassMessageJob->run()
  #8 /srv/mediawiki/php-1.30.0-wmf.9/includes/jobqueue/JobRunner.php(193): JobRunner->executeJob(MassMessageJob, Wikimedia\Rdbms\LBFactoryMulti, BufferingStatsdDataFactory, integer)
  #9 /srv/mediawiki/rpc/RunJobs.php(47): JobRunner->run(array)
  #10 {main}
Aklapper renamed this task from MassMessage not working for delivery to Suwikiquote or Mlwikiquote to MassMessage not working for delivery to Suwikiquote or Mlwikiquote: "CAS update failed on user_touched".Jul 19 2017, 9:45 AM

Also see T124414

Nikerabbit added a subscriber: Nikerabbit.

Also happening elsewhere, example from aawiki:

 exception.message	   	CAS update failed on user_touched. The version of the user to be saved is older than the current version.
t exception.trace	   	#0 /srv/mediawiki/php-1.35.0-wmf.37/includes/libs/rdbms/database/Database.php(4264): User->{closure}(Wikimedia\Rdbms\DatabaseMysqli, string)
#1 /srv/mediawiki/php-1.35.0-wmf.37/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->doAtomicSection(string, Closure)
#2 /srv/mediawiki/php-1.35.0-wmf.37/includes/libs/rdbms/database/DBConnRef.php(641): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#3 /srv/mediawiki/php-1.35.0-wmf.37/includes/user/User.php(3523): Wikimedia\Rdbms\DBConnRef->doAtomicSection(string, Closure)
#4 /srv/mediawiki/php-1.35.0-wmf.37/includes/user/User.php(844): User->saveSettings()
#5 /srv/mediawiki/php-1.35.0-wmf.37/extensions/MassMessage/includes/MassMessage.php(44): User::newSystemUser(string, array)
#6 /srv/mediawiki/php-1.35.0-wmf.37/extensions/MassMessage/includes/Job/MassMessageJob.php(93): MediaWiki\MassMessage\MassMessage::getMessengerUser()
#7 /srv/mediawiki/php-1.35.0-wmf.37/extensions/MassMessage/includes/Job/MassMessageJob.php(368): MediaWiki\MassMessage\Job\MassMessageJob->getUser()
#8 /srv/mediawiki/php-1.35.0-wmf.37/extensions/MassMessage/includes/Job/MassMessageJob.php(260): MediaWiki\MassMessage\Job\MassMessageJob->makeAPIRequest(array)
#9 [internal function]: MediaWiki\MassMessage\Job\MassMessageJob->editPage(string)
#10 /srv/mediawiki/php-1.35.0-wmf.37/extensions/MassMessage/includes/Job/MassMessageJob.php(240): call_user_func(array, string)
#11 /srv/mediawiki/php-1.35.0-wmf.37/extensions/MassMessage/includes/Job/MassMessageJob.php(69): MediaWiki\MassMessage\Job\MassMessageJob->sendMessage()
#12 /srv/mediawiki/php-1.35.0-wmf.37/extensions/EventBus/includes/JobExecutor.php(79): MediaWiki\MassMessage\Job\MassMessageJob->run()
#13 /srv/mediawiki/rpc/RunSingleJob.php(76): MediaWiki\Extension\EventBus\JobExecutor->execute(array)
#14 {main}

The trace is similar to T124414: MassMessage Fatal error: CAS update failed on user_touched but that has been closed. This is still open, so hijacking it as both traces have MassMessage::getMessengerUser in them.

JJMC89 changed the subtype of this task from "Task" to "Production Error".Jul 23 2020, 3:45 AM

At least on aawiki, MediaWiki message delivery is not a system user: it has a real user token. (But also an empty email and password, which suggests it was a system user at some point... not sure how that came to be.)
That triggers the "steal" branch of User::newSystemUser, which tries to change the token and save the user, which results in a CAS error. Either the job is conflicting with itself, like in T124414 (but that seems unlikely: why would it run multiple instances for aawiki? and more importantly, at least one of the user updates would succeed, and we'd see a valid token) or maybe the logic in User::newSystemUser conflicts with something done in one of the revokeAccessForUser handlers (but that also doesn't seem to be the case: the only things that would run are providerChangeAuthenticationData in LocalPasswordPrimaryAuthenticationProvider and TemporaryPasswordPrimaryAuthenticationProvider and those don't change user_touched).

Running User::newSystemUser manually worked fine on aawiki.

Krinkle assigned this task to Tgr.
Krinkle added a subscriber: Krinkle.
This comment was removed by Krinkle.

Logstash query "CAS update failed on user_touched" AND exception.trace:"MassMessage" over the past 30 days still finds recent ocurrences of this unexplained corruption. The more recent one is from a few hours ago:

  • timestamp: 2020-09-21T21:29:12
  • reqId: b40c328e-4e5e-49b2-9eae-6a307de214cb
  • wiki: mlwikiquote
CAS update failed on user_touched. The version of the user to be saved is older than the current version.
#0 /srv/mediawiki/php-1.36.0-wmf.9/includes/libs/rdbms/database/Database.php(4461): User->{closure}(Wikimedia\Rdbms\DatabaseMysqli, string)
#1 /srv/mediawiki/php-1.36.0-wmf.9/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->doAtomicSection(string, Closure)
#2 /srv/mediawiki/php-1.36.0-wmf.9/includes/libs/rdbms/database/DBConnRef.php(641): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#3 /srv/mediawiki/php-1.36.0-wmf.9/includes/user/User.php(3398): Wikimedia\Rdbms\DBConnRef->doAtomicSection(string, Closure)
#4 /srv/mediawiki/php-1.36.0-wmf.9/includes/user/User.php(850): User->saveSettings()
#5 /srv/mediawiki/php-1.36.0-wmf.9/extensions/MassMessage/includes/MassMessage.php(44): User::newSystemUser(string, array)
#6 /srv/mediawiki/php-1.36.0-wmf.9/extensions/MassMessage/includes/Job/MassMessageJob.php(93): MediaWiki\MassMessage\MassMessage::getMessengerUser()
#7 /srv/mediawiki/php-1.36.0-wmf.9/extensions/MassMessage/includes/Job/MassMessageJob.php(368): MediaWiki\MassMessage\Job\MassMessageJob->getUser()
#8 /srv/mediawiki/php-1.36.0-wmf.9/extensions/MassMessage/includes/Job/MassMessageJob.php(260): MediaWiki\MassMessage\Job\MassMessageJob->makeAPIRequest(array)
#9 [internal function]: MediaWiki\MassMessage\Job\MassMessageJob->editPage(string)
#10 /srv/mediawiki/php-1.36.0-wmf.9/extensions/MassMessage/includes/Job/MassMessageJob.php(240): call_user_func(array, string)
#11 /srv/mediawiki/php-1.36.0-wmf.9/extensions/MassMessage/includes/Job/MassMessageJob.php(69): MediaWiki\MassMessage\Job\MassMessageJob->sendMessage()
#12 /srv/mediawiki/php-1.36.0-wmf.9/extensions/EventBus/includes/JobExecutor.php(79): MediaWiki\MassMessage\Job\MassMessageJob->run()
#13 /srv/mediawiki/rpc/RunSingleJob.php(76): MediaWiki\Extension\EventBus\JobExecutor->execute(array)
#14 {main}
Krinkle triaged this task as High priority.Sep 22 2020, 3:18 PM
BPirkle added a subscriber: BPirkle.

Still seeing on wikiquote and wikibooks as of Oct 7, 2020.