Users unable to remain logged in, associated with attempts to upgrade the password hash on every login
Closed, ResolvedPublic

Description

We've had two recent reports of users being able to log in, but then finding themselves logged out either immediately on being redirected back from the login form or on the next page view after.

Checking the logs, I see each successful login reports "Set global password for '${USER}'", which most likely indicates that the PasswordFactory::needsUpdate() check returned true.

In the case of Amanda bee, when I looked at the database and the data in the WAN cache, however, the password seemed to be up to date. I asked the user to attempt another login, and it stuck. In the case of the other user, the database and the WAN cache still have an outdated password hash at this time despite 11 "Set global password" messages logged on June 21.

Speculation: I wonder whether @aaron's recent changes to make various CentralAuth things happen post-send and checks to happen on non-master instances are related. In particular, the call to CentralAuthUser::setPassword() is going to reset the auth token, which would explain the users finding themselves logged out when visiting a new page (since it's post-send, it can't send the user new cookies for the new token like it should; perhaps we should pass false for setPassword's $resetAuthToken parameter there?).

Apparently the attempt to actually updated the password hash is somehow failing or being overwritten by some other deferred thing. In Amanda bee's case, I suspect the attempt to log in on csbwiktionary (where no local account existed) managed to bypass whatever was preventing the password hash update from remaining. That attempt still got its session wiped out, but the following attempt succeeded since no hash upgrade was required anymore.

The original report is below. I haven't named the other user here because I don't know the privacy policy with respect to OTRS reports.


"User:Amanda bee" can't login to any Wikimedia project. They reported the issue on #wikimedia-tech. They can login to a newly registered account "User:Amanda test".

Summary of the problem: https://en.wikipedia.org/wiki/User:Amanda_test#Here.27s_the_deal

IRC conversation so far: https://wm-bot.wmflabs.org/logs/%23wikimedia-tech/20170629.txt

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 29 2017, 7:10 PM
Anomie added a subscriber: aaron.Jun 29 2017, 8:52 PM

"can't log in" doesn't seem to be accurate, it seems to be more "can't stay logged in".

I see several successful attempted logins in the logs. I just had the user attempt another login at 20:26, and they report that they are now remaining logged in.

Looking in the logs, I note that every successful login except the last reports "Set global password for 'Amanda bee'", which most likely indicates that the PasswordFactory::needsUpdate() check returned true. When I looked at the database and the data in the WAN cache, however, the password seemed to be up to date. That's when I decided to have the user attempt another login to see if it would stick this time, which it did.

Speculation: I wonder whether @aaron's recent changes to make various CentralAuth things happen post-send and checks to happen on non-master instances are related. In particular, the call to CentralAuthUser::setPassword() is going to reset the auth token, which would explain the user finding themself logged out when visiting a new page (since it's post-send, it can't send the user new cookies for the new token like it should; perhaps we should pass false for setPassword's $resetAuthToken parameter there?). If it also manages to avoid clearing the WAN cache somehow or if something somehow repopulates it with stale data after setPassword clears it, that could lead to the reset happening on every login. When MatmaRex had the user attempt to log in on csbwiktionary, something about the login to a wiki where auto-creation was needed may have managed to finally purge that bad WAN cache entry.

Just confirming -- I can now stay logged in.

Bonus: my password was very old and a dictionary word that I used to reuse a great deal. If there was any kind of password checking going on I'm sure it would have snagged on mine. I've changed it.

Thx.

Catalan triaged this task as High priority.Jun 29 2017, 9:23 PM
Catalan closed this task as Resolved.
Catalan claimed this task.
Catalan added a subscriber: Catalan.

Resolved per above. If you're wondering why I'm doing this, I've subscribed to be emailed for all tasks related to MediaWiki-Authentication-and-authorization - including this one.

Flagging as high priority since this was something that was important to resolve.

Catalan removed Catalan as the assignee of this task.Jun 29 2017, 9:23 PM

I note that, while the issue was resolved for this user, we don't know what caused it or whether it's happening to others.

T168858 might be two other instances from this week.

T168858 might be two other instances from this week.

Neither of those users appear to have this issue, unless I'm completely wrong about the cause. On the other hand, I received an email from an OTRS volunteer (@Platonides) with a report about a user who does appear to have the same issue.

The issue here seems to be characterized by MediaWiki attempting (but somehow failing) to upgrade the user's password hash on every login, causing the session to be invalidated just after it has been served to the user. T168858 does not have this hallmark. The users there should try clearing their cookies or using a different browser to see if the issue persists.

Anomie renamed this task from "User:Amanda bee" can't login to Users unable to remain logged in, associated with attempts to upgrade the password hash on every login.Jun 30 2017, 2:15 PM
Anomie reopened this task as Open.
Anomie updated the task description. (Show Details)
aaron added a comment.Jul 1 2017, 12:49 AM

I wonder if the notices in https://logstash.wikimedia.org/goto/9ab53faf39cd04d44e9b69d82e426f9e are of any use. They use user ID instead of name, which is somewhat annoying.

Anomie added a comment.EditedJul 1 2017, 12:04 PM

Maybe, although when I did a few spot checks using the unique_id or reqId no non-auth log messages turned up. (edit: oops, I misread the logstash query)

I also note that CentralAuthUser::setPassword() itself sets the password in the database, bypassing the CAS check, so even if the saveSettings() fails the password hash still should have been updated.

When I checked that logstash query for gu_id 5631073 (Amanda bee's gu_id), nothing came up.

JJMC89 added a subscriber: JJMC89.Jul 1 2017, 10:05 PM

Also see this case for User:BB-PB, which was resolved by logging at csb.wiktionary.

Just had another one reporting on enwiki, User:Thankyoubaby - had them resolve it by browsing directly to https://login.wikimedia.org

Tgr added a comment.Jul 3 2017, 9:43 AM

API logins might also be affected:

login errors (last 30 days)logouts (last 30 days)

(Of course, it could always be caused by an unrelated bug in a single high-volume client. The timing matches though.)

Change 363015 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/CentralAuth@master] Fix handling of password hash upgrade on login

https://gerrit.wikimedia.org/r/363015

I had the same problem and logging into a wiki I never logged in before fixed it.

Just had another user with issue (https://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(technical)&oldid=789384548#Problem_with_logging_in) Nineko - they only got in after finding a wiki they had never logged in to with central auth (zh.wikivoyage in this case).

Given the number of reports and importance of the feature, this really should be higher priority…

@aaron Can you review the patch for this today?

Legoktm raised the priority of this task from High to Unbreak Now!.Jul 7 2017, 8:31 PM
Restricted Application added subscribers: Jay8g, TerraCodes. · View Herald TranscriptJul 7 2017, 8:31 PM
Paladox added a subscriber: Paladox.Jul 7 2017, 8:32 PM

Change 363015 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Fix handling of password hash upgrade on login

https://gerrit.wikimedia.org/r/363015

Change 363891 had a related patch set uploaded (by Legoktm; owner: Anomie):
[mediawiki/extensions/CentralAuth@wmf/1.30.0-wmf.7] Fix handling of password hash upgrade on login

https://gerrit.wikimedia.org/r/363891

Change 363891 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@wmf/1.30.0-wmf.7] Fix handling of password hash upgrade on login

https://gerrit.wikimedia.org/r/363891

Mentioned in SAL (#wikimedia-operations) [2017-07-07T21:54:44Z] <legoktm@tin> Synchronized php-1.30.0-wmf.7/extensions/CentralAuth/: Fix handling of password hash upgrade on login - T169261 (duration: 00m 45s)

jfsamper was having this issue yesterday, and we presumed that T141482 was the cause but this sounds more plausible.

Confirming that jfsamper was able to recover by logging into the Hungarian and Norwegian wikis in succession.

Bawolff added a subscriber: Bawolff.Jul 9 2017, 5:53 PM

Mentioned in SAL (#wikimedia-operations) [2017-07-10T10:13:44Z] <addshore> reverting https://gerrit.wikimedia.org/r/#/c/363891 as it is sitting on tin undeployed T169261

Mentioned in SAL (#wikimedia-operations) [2017-07-10T10:21:07Z] <addshore@tin> Synchronized php-1.30.0-wmf.7/extensions/CentralAuth: CentralAuth (undeployed patches) [[gerrit:363892]], [[gerrit:363893]], [[gerrit:363891]] & revert [[gerrit:364182]] T169261 (duration: 00m 47s)

Samtar added a subscriber: Samtar.Jul 11 2017, 7:04 PM

fwiw I've had three OTRS emails from users complaining of this issue today, so it's affecting a number of users. I've asking them to try the above solution (logging into a project they've never visited before)

greg added a subscriber: greg.EditedJul 11 2017, 9:30 PM

@Samtar: just to be clear: those users were experiencing this problem today, and not just reporting it late or you seeing the reports late, right? The proposed fix was deployed on Jul 10th at 10:21 UTC.

@greg Just trying to have a look at the ticket open dates (finding them after closing them as successful is more difficult than it should be!) though it's likely these were reported prior to the 10th. I'll keep checking and try to confirm

greg assigned this task to Anomie.Jul 13 2017, 10:14 PM

Assigning to Brad based on patch authorship (apologies if that's not right, Brad).

The patch is in wmf.7 and wmf.9 (we skipped wmf.8 for the short week of July 4th), so it should be serving to all users.

Without any more indication of new reports, is there anything else to follow-up? @Anomie ? @matmarex ?

Assuming there are no reports of this occurring after the fix was deployed, we're good.

greg added a comment.Jul 17 2017, 6:30 PM

Assuming there are no reports of this occurring after the fix was deployed, we're good.

Who wants to make the status->resolved call? :)

Anomie closed this task as Resolved.Jul 17 2017, 6:32 PM

Ok, I'll do it. If there are reports of this occurring after the fix was deployed, someone can reopen.

Both patches have been merged.

A similar problem has been reported on IRC #wikipedia-pl about the account "WebmajstrBot" on pl.wp just now. The user had to leave before I could respond, but you can reach them at https://pl.wikipedia.org/wiki/Dyskusja_wikipedysty:Webmajstr.

I note that https://pl.wikipedia.org/wiki/Specjalna:Wersja currently reports 1.30.0-wmf.7, and the deployed branch does not appear to currently contain the fix (presumably thanks to rECAU3ab7cd365c42: Revert "Fix handling of password hash upgrade on login")

I guess that explains it. wmf.9 had some deployment problems (alas https://www.mediawiki.org/wiki/MediaWiki_1.30/Roadmap is not updated, but see T167893), and the patch was mysteriously not deployed to wmf.7. I'm pretty unhappy that this (an UBN task) was still broken in production 10 days after a patch was available. :(

Well, hopefully it's fixed now that we're on wmf.9 and moving on to wmf.10.

Authentication bugs are notorious for having similar symptoms with different causes, since the symptom is usually "I can't log in" or "I get logged out unexpectedly" and there are many things that can cause that, including actual MediaWiki bugs like this one, browser bugs like T151770, mysteriously-corrupted cookies that never get enough information to be investigated, and user error such as blocking first- and/or third-party cookies from our sites. And then people tend to assume their problem is the same as someone else's.

Which is why we usually wind up asking people with authentication problems to try several things:

  • Report exactly what happens, since "login submits successfully but I'm not logged in", "I'm logged in for the first page-view but logged out when I click any link", "'a loss of session data' when trying to log in", and so on all point to different causes. The last can even point to different things depending on when exactly in the process it occurs.
  • Clear all cookes and try again, to see if it's screwed-up cookies.
  • Log in using a different browser/device, to see if it's something screwed up in the browser.
  • Log in to a different account, perhaps even a newly-created account, to see if it's something wrong with the original account.
  • Capture the HTTP requests and responses so we can look for subtle signs of how exactly things are failing.

As for the users mentioned there,

  • Drewndia and Catalan do not seem to be an instance of this bug, as mentioned in T169261#3395129.
  • Wikizeboux and Serrod (that seems to be the identity of the "unidentified user") both do seem like instances of this bug, and so will probably be helped. The workaround of trying to log in on a wiki where no local account exists should also work for this user.
  • For Cantons-de-l'Est's various accounts, Estrie and Talabot may be instances of this bug.
  • Assuming "Manuela bertola" is the user name in T168858#3451751, that does look like an instance of this bug too.

If you have access to Kibana, you can look for "Set global password for '{$USER}'" log entries on every login. Or if logging in to a wiki where the user didn't have a local account before fixes it for subsequent login attempts, that's pretty indicative of this bug too.

This comment was removed by Zoranzoki21.

@Zoranzoki21: Your problem is unrelated to T169261. Your problem is being handled in T177284 instead.

@Zoranzoki21: Your problem is unrelated to T169261. Your problem is being handled in T177284 instead.

Ok