Page MenuHomePhabricator

User with unattached accounts unable to login (504) after 1.24wmf22 deployment
Closed, ResolvedPublic

Description

User:NickK reported in IRC that they're getting 504 timeout errors when trying to login to their unattached accounts on 1.24wmf22. On Wikipedias (still on 1.24wmf21), they're able to login fine.

My initial hunch is that I3a219d86a8837ed9c0c3788604dc3229ee1245ee is to blame.


Version: unspecified
Severity: critical

Details

Reference
bz71223

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 3:52 AM
bzimport set Reference to bz71223.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to Kunal Mehta (Legoktm) from comment #0)

My initial hunch is that I3a219d86a8837ed9c0c3788604dc3229ee1245ee is to
blame.

I was wondering yesterday, how does it work when it's not deployed on all wikis at once? Nick has accounts on abut 200 wikis IIRC.
(If that was the issue, you'd need to cherry pick the new change to 1.24wmf11.)

And it's still incorrectly not listed in https://www.mediawiki.org/wiki/MediaWiki_1.24/wmf22#CentralAuth

gerritadmin wrote:

Change 162549 had a related patch set uploaded by Legoktm:
Revert "Auto-migrate matching accounts where no global account exists"

https://gerrit.wikimedia.org/r/162549

gerritadmin wrote:

Change 162550 had a related patch set uploaded by Legoktm:
Revert "Auto-migrate matching accounts where no global account exists"

https://gerrit.wikimedia.org/r/162550

(In reply to Nemo from comment #1)

(In reply to Kunal Mehta (Legoktm) from comment #0)

My initial hunch is that I3a219d86a8837ed9c0c3788604dc3229ee1245ee is to
blame.

I was wondering yesterday, how does it work when it's not deployed on all
wikis at once? Nick has accounts on abut 200 wikis IIRC.
(If that was the issue, you'd need to cherry pick the new change to
1.24wmf11.)

The code only needs to be enabled on the wiki the user is logging in from. Which is why logins from Wikipedias work, because that code path doesn't exist yet.


I think this code in general is just too slow. I didn't test it with the number of unattached accounts that NickK has :/

The main weird thing I see though is around a hundred log lines like:

Set global password for 'NickK'

That's from CentralAuthUser::setPassword, which is called in matchHash. We need to make matchHash aware whether it is meant to be a check only, or actual authentication so it should try updating the password. Otherwise we're just making hundreds of useless write queries.

gerritadmin wrote:

Change 162549 merged by jenkins-bot:
Revert "Auto-migrate matching accounts where no global account exists"

https://gerrit.wikimedia.org/r/162549

(In reply to Kunal Mehta (Legoktm) from comment #4)

We
need to make matchHash aware whether it is meant to be a check only, or
actual authentication so it should try updating the password. Otherwise
we're just making hundreds of useless write queries.

Interesting. This would bite when running migration scripts as well, so in a way it's good to discover and fix it earlier than that...

Of course login can't afford failing, but allow me to point out that NickK is about the worst case available. I knew of it and I take blame for not mentioning it in code review. That said, the "winner" of such a big active account should have been merged at Special:MergeAccount or equivalent several years ago.

gerritadmin wrote:

Change 162550 merged by jenkins-bot:
Revert "Auto-migrate matching accounts where no global account exists"

https://gerrit.wikimedia.org/r/162550