Page MenuHomePhabricator

MassMessage Fatal error: CAS update failed on user_touched
Closed, ResolvedPublic

Description

Trying to use MassMessage yields the error above. I've tried with several test messages and target lists.

2016-01-22 13:14:39 mw1257 metawiki exception ERROR: [079ab5da] /wiki/Special:MassMessage   MWException from line 3662 of /srv/mediawiki/php-1.27.0-wmf.11/includes/user/User.php: CAS update failed on user_touched for user ID '3580349' (read from slave); the version of the user to be saved is older than the current version. {"exception_id":"079ab5da"} 
[Exception MWException] (/srv/mediawiki/php-1.27.0-wmf.11/includes/user/User.php:3662) CAS update failed on user_touched for user ID '3580349' (read from slave); the version of the user to be saved is older than the current version.
  #0 /srv/mediawiki/php-1.27.0-wmf.11/includes/session/SessionManager.php(534): User->saveSettings()
  #1 /srv/mediawiki/php-1.27.0-wmf.11/includes/user/User.php(679): MediaWiki\Session\SessionManager->preventSessionsForUser(string)
  #2 /srv/mediawiki/php-1.27.0-wmf.11/extensions/MassMessage/includes/MassMessage.php(44): User::newSystemUser(string, array)
  #3 /srv/mediawiki/php-1.27.0-wmf.11/extensions/MassMessage/includes/SpecialMassMessage.php(291): MassMessage::getMessengerUser()
  #4 /srv/mediawiki/php-1.27.0-wmf.11/extensions/MassMessage/includes/SpecialMassMessage.php(189): SpecialMassMessage->preview(array)
  #5 /srv/mediawiki/php-1.27.0-wmf.11/includes/htmlform/HTMLForm.php(607): SpecialMassMessage->callback(array)
  #6 /srv/mediawiki/php-1.27.0-wmf.11/includes/htmlform/HTMLForm.php(504): HTMLForm->trySubmit()
  #7 /srv/mediawiki/php-1.27.0-wmf.11/extensions/MassMessage/includes/SpecialMassMessage.php(71): HTMLForm->tryAuthorizedSubmit()
  #8 /srv/mediawiki/php-1.27.0-wmf.11/includes/specialpage/SpecialPage.php(384): SpecialMassMessage->execute(NULL)
  #9 /srv/mediawiki/php-1.27.0-wmf.11/includes/specialpage/SpecialPageFactory.php(564): SpecialPage->run(NULL)
  #10 /srv/mediawiki/php-1.27.0-wmf.11/includes/MediaWiki.php(280): SpecialPageFactory::executePath(Title, RequestContext)
  #11 /srv/mediawiki/php-1.27.0-wmf.11/includes/MediaWiki.php(738): MediaWiki->performRequest()
  #12 /srv/mediawiki/php-1.27.0-wmf.11/includes/MediaWiki.php(510): MediaWiki->main()
  #13 /srv/mediawiki/php-1.27.0-wmf.11/index.php(43): MediaWiki->run()
  #14 /srv/mediawiki/w/index.php(3): include(string)
  #15 {main}

Event Timeline

Johan raised the priority of this task from to Needs Triage.
Johan updated the task description. (Show Details)
Johan added a project: MassMessage.
Johan subscribed.

@Trizek-WMF got the same type of error message when he tried.

Reedy set Security to None.
Reedy added a project: MediaWiki-General.
Johan triaged this task as Unbreak Now! priority.Jan 22 2016, 2:03 PM

This issue might likely block Tech News delivery on Monday, according to Johan.

Aklapper renamed this task from MassMessage error: [079ab5da] (timestamp) Fatal exception of type "MWException" to MassMessage Fatal error: CAS update failed on user_touched.Jan 22 2016, 4:29 PM

@Anomie @Tgr Is this related to the new session stuff?

Smells like lag/caching to me. We could try manually calling $user->load( User::READ_LATEST ) at SessionManager.php line 532-ish to see if that fixes it, but that's mostly a guess.

Maybe related? T124441: Not all MassMessage sent (the task number looks confusingly similar but it's different)

Change 265775 had a related patch set uploaded (by Anomie):
User: Fix loading of user_touched

https://gerrit.wikimedia.org/r/265775

Turns out the user_touched timestamp wasn't even being loaded here, due to metawiki's user table having user_email declared as nullable and a bogus test in User::loadFromRow().

Change 265776 had a related patch set uploaded (by Anomie):
User: Fix loading of user_touched

https://gerrit.wikimedia.org/r/265776

Change 265775 merged by jenkins-bot:
User: Fix loading of user_touched

https://gerrit.wikimedia.org/r/265775

Change 265776 merged by jenkins-bot:
User: Fix loading of user_touched

https://gerrit.wikimedia.org/r/265776

Anomie claimed this task.

Fixed and backported.

We are still getting these.

All seem to be on commonswiki now.

Also, all are calls from /rpc/RunJobs.php, none through the web interface.

Looking more closely at all the exceptions being logged at the time, it's really looking like it is a CAS update failure. I see a bunch of "Lock wait timeout exceeded" database exceptions at about the same time that are all trying to update the same row in the user table, so what it seems is happening is that whatever fires /rpc/RunJobs.php is firing multiple in parallel for MassMessageJob and they're all conflicting.

We could try moving the SessionManager::singleton()->preventSessionsForUser( $user->getName() ); call in User::newSystemUser() inside the if just above where it is now, or figure out some other way to avoid having to update the database every time through.

Can't we just have an invalid token value, like the empty string for passwords, and set it to that, and not update the user if it's already set to that?

The trick is that we'd also need to make sure that the invalid token value doesn't pass existing token checks, which are all done using === or the like.

Since 'MediaWiki message delivery' is a reserved username (via the get reserved usernames hook IIRC), can we skip calling SessionManager::singleton()->preventSessionsForUser() entirely? It shouldn't be possible to login using that username, so there should be no active sessions...

A horrible hack would be to set the token to <random string>SYSTEM and ignore such tokens when considering an update.

It shouldn't be possible to login using that username, so there should be no active sessions...

What happens to existing sessions (and OAuth grants etc) when you install the MassMessage extension, though?

Change 267907 had a related patch set uploaded (by Anomie):
Introduce User::INVALID_TOKEN

https://gerrit.wikimedia.org/r/267907

Change 267907 merged by jenkins-bot:
Introduce User::INVALID_TOKEN

https://gerrit.wikimedia.org/r/267907

What about T125214#1983892?

Please delete all failed msg from queue to prevent duplicates.

I don't quite understand what's being asked (delete from the job queue? stuff should be long gone from there I think?), do we have anything further to do about this? @Legoktm I think you did this a previous time?