(I'm not entirely sure if this is CentralAuth, AuthManager or some other component and which project tag should be used for it; filing under General/Unknown for now and until we figure it out)
During today's (2016-08-19) network maintenance in eqiad and due to a router OS bug, we had two network partitioning events that specifically affected row D, i.e. 1/4th of our datacenter. The first lasted for a few minutes; the second lasted probably for a few milliseconds.
During both, we got reports from users that they were seeing exceptions, could not login, and more importantly, of users being silently logged-out(!).
Example 1:
15:19 < yannf> https://fr.wikisource.org/w/index.php?title=Livre:Reclus_-_La_Commune_de_Paris_au_jour_le_jour.djvu&action=purge 15:20 < yannf> it works now, but... 15:20 < yannf> it asks for confirmation 15:20 < yannf> which it didn't before
(we ask anonymous users for confirmation on action=purge, but not logged-in users)
Example 2:
15:45 < Revent> [V7WtjQpAADwAAZV-nOoAAAAN] 2016-08-18 12:44:01: Fatal exception of type "MWException" <- when attempting to login to enwiki
which seems to be this:
2016-08-18 12:44:01 [V7WtjQpAADwAAZV-nOoAAAAN] mw1265 enwiki 1.28.0-wmf.14 exception ERROR: [V7WtjQpAADwAAZV-nOoAAAAN] /w/index.php?title=Special:UserLogin&returnto=(censored) MWException from line 4001 of /srv/mediawiki/php-1.28.0-wmf.14/includes/user/User.php: CAS update failed on user_touched for user ID '(censored)' (read from slave); the version of the user to be saved is older than the current version. {"exception_id":"V7WtjQpAADwAAZV-nOoAAAAN"} [Exception MWException] (/srv/mediawiki/php-1.28.0-wmf.14/includes/user/User.php:4001) CAS update failed on user_touched for user ID '(censored)' (read from slave); the version of the user to be saved is older than the current version. #0 /srv/mediawiki/php-1.28.0-wmf.14/extensions/VisualEditor/VisualEditor.hooks.php(1012): User->saveSettings() #1 /srv/mediawiki/php-1.28.0-wmf.14/includes/Hooks.php(195): VisualEditorHooks::onUserLoggedIn(User) #2 /srv/mediawiki/php-1.28.0-wmf.14/includes/auth/AuthManager.php(2351): Hooks::run(string, array) #3 /srv/mediawiki/php-1.28.0-wmf.14/includes/auth/AuthManager.php(655): MediaWiki\Auth\AuthManager->setSessionDataForUser(User, boolean) #4 /srv/mediawiki/php-1.28.0-wmf.14/includes/auth/AuthManager.php(349): MediaWiki\Auth\AuthManager->continueAuthentication(array) #5 /srv/mediawiki/php-1.28.0-wmf.14/includes/specialpage/AuthManagerSpecialPage.php(354): MediaWiki\Auth\AuthManager->beginAuthentication(array, string) #6 /srv/mediawiki/php-1.28.0-wmf.14/includes/specialpage/AuthManagerSpecialPage.php(483): AuthManagerSpecialPage->performAuthenticationStep(string, array) #7 /srv/mediawiki/php-1.28.0-wmf.14/includes/htmlform/HTMLForm.php(635): AuthManagerSpecialPage->handleFormSubmit(array, VFormHTMLForm) #8 /srv/mediawiki/php-1.28.0-wmf.14/includes/specialpage/AuthManagerSpecialPage.php(417): HTMLForm->trySubmit() #9 /srv/mediawiki/php-1.28.0-wmf.14/includes/specialpage/LoginSignupSpecialPage.php(292): AuthManagerSpecialPage->trySubmit() #10 /srv/mediawiki/php-1.28.0-wmf.14/includes/specialpage/SpecialPage.php(505): LoginSignupSpecialPage->execute(NULL) #11 /srv/mediawiki/php-1.28.0-wmf.14/includes/specialpage/SpecialPageFactory.php(598): SpecialPage->run(NULL) #12 /srv/mediawiki/php-1.28.0-wmf.14/includes/MediaWiki.php(283): SpecialPageFactory::executePath(Title, RequestContext) #13 /srv/mediawiki/php-1.28.0-wmf.14/includes/MediaWiki.php(749): MediaWiki->performRequest() #14 /srv/mediawiki/php-1.28.0-wmf.14/includes/MediaWiki.php(521): MediaWiki->main() #15 /srv/mediawiki/php-1.28.0-wmf.14/index.php(43): MediaWiki->run() #16 /srv/mediawiki/w/index.php(3): include(string) #17 {main}
Getting exceptions and/or other errors when these events happen is kind of expected, if it's not unavoidable. Silently logging-out users is not and as far as I know this odd behavior is new. From the backtrace above, it looks like it may be AuthManager-related, but someone would have to dig a little deeper into all that and test MW authentication code's resiliency to such availability events.