Page MenuHomePhabricator

Unattached accounts created on registration
Closed, ResolvedPublic

Description

Here: https://meta.wikimedia.org/wiki/Special:CentralAuth/%E6%97%A5%E6%9C%AC%E8%84%B1%E7%A8%8E%E5%85%9A
is another newly created account with one unattached account.

Event Timeline

Tgr triaged this task as High priority.EditedJul 22 2016, 10:17 PM
mysql:wikiadmin@db1094 [centralauth]> select * from localuser where lu_name = ' 日本脱税党';
+---------------+-----------------+-----------------------+--------------------+
| lu_wiki       | lu_name         | lu_attached_timestamp | lu_attached_method |
+---------------+-----------------+-----------------------+--------------------+
| jawikiquote   | 日本脱税党        | 20160722191303        | login              |
| mediawikiwiki | 日本脱税党        | 20160722190604        | login              |
| metawiki      | 日本脱税党        | 20160722190603        | new                |
+---------------+-----------------+-----------------------+--------------------+
3 rows in set (0.00 sec)

mysql:wikiadmin@db1094 [centralauth]> select * from localnames where ln_name = '日本脱税党';
+---------------+-----------------+
| ln_wiki       | ln_name         |
+---------------+-----------------+
| jawikiquote   | 日本脱税党        |
| loginwiki     | 日本脱税党        |
| mediawikiwiki | 日本脱税党        |
| metawiki      | 日本脱税党        |
+---------------+-----------------+
4 rows in set (0.00 sec)

and the loginwiki account does indeed exist.

This is extra bad due to T137551.

Created by CentralAuthCreateLocalAccountJob, which says registration failed for global account '日本脱税党'.

which says registration failed for global account '日本脱税党'.

That's logged in CentralAuthUser::register which as far as I can see could only be called from CentralAuthHooks::onLocalUserCreated after passing a !$centralUser->exists() check. So probably the job is scheduled and executed so fast that it runs before the transaction for the registration is committed.

The odd thing about that error is that it comes from CentralAuthUser::register(), which shouldn't have been called from CentralAuthCreateLocalAccountJob at all.

To make this happen, first CentralAuthUser::getInstance( $user )->exists() would have to be true to avoid erroring out in CentralAuthCreateLocalAccountJob, but then the LocalUserCreated hook would have to find that CentralAuthUser::getMasterInstance( $user )->exists() returns false. So, maybe if the CA cache was updated but the transaction wasn't committed yet, then the job got picked up and run before that transaction commit happened.

mysql:wikiadmin@db1086 [centralauth]> SELECT COUNT(*) FROM localnames LEFT OUTER JOIN localuser ON ln_wiki = lu_wiki AND ln_name = lu_name WHERE lu_name IS NULL;
+----------+
| COUNT(*) |
+----------+
|     1567 |
+----------+
1 row in set (5 min 43.16 sec)

Of course some of that could be due to T119736 or something else entirely and does not necessarily represent real unattached accounts.

Change 300679 had a related patch set uploaded (by Gergő Tisza):
Do not schedule local account creation jobs until central transaction is committed

https://gerrit.wikimedia.org/r/300679

Tgr raised the priority of this task from High to Unbreak Now!.EditedJul 23 2016, 12:02 AM

SELECT gu_name, ln_wiki, gu_registration FROM localnames LEFT OUTER JOIN localuser ON ln_wiki = lu_wiki AND ln_name = lu_name JOIN globaluser ON ln_name = gu_name WHERE lu_name IS NULL ORDER BY gu_registration DESC


A new unattached account is created every couple minutes. The first account is from 20160719202514.

Change 300679 merged by jenkins-bot:
Do not schedule local account creation jobs until central transaction is committed

https://gerrit.wikimedia.org/r/300679

Mentioned in SAL [2016-07-23T00:37:41Z] <tgr> doing an emergency deploy of https://gerrit.wikimedia.org/r/#/c/300679 for T141160, creates dozens of new users per hour to be unattached on loginwiki which probably has weird consequences

Change 300690 had a related patch set uploaded (by Gergő Tisza):
Do not schedule local account creation jobs until central transaction is committed

https://gerrit.wikimedia.org/r/300690

Change 300690 merged by jenkins-bot:
Do not schedule local account creation jobs until central transaction is committed

https://gerrit.wikimedia.org/r/300690

Mentioned in SAL [2016-07-23T01:00:46Z] <tgr@tin> Synchronized php-1.28.0-wmf.11/extensions/CentralAuth/includes/CentralAuthPrimaryAuthenticationProvider.php: T141160 (duration: 00m 28s)

Mentioned in SAL [2016-07-23T01:01:25Z] <tgr@tin> Synchronized php-1.28.0-wmf.11/extensions/CentralAuth/includes/CentralAuthHooks.php: T141160 (duration: 00m 27s)

Mentioned in SAL [2016-07-23T01:02:06Z] <tgr@tin> Synchronized php-1.28.0-wmf.11/extensions/CentralAuth/includes/CentralAuthPlugin.php: T141160 (duration: 00m 29s)

No more cases since the patch was deployed so it seems like it worked. The full list of affected users is

(921 users).

Tgr claimed this task.

Ran Bryan's script from T141020; all accounts should be fixed now.

Tgr added a subscriber: MaxSem.

Per @MaxSem and @Anomie, this is still happening quite a bit.

(Or rather, happening again; but has started more than 30 days ago.)

Are any unattached accounts actually being created? Or are we just seeing transient errors when the CentralAuthCreateLocalAccountJob for loginwiki races with the redirect to Special:CentralAuth/start on loginwiki?

You are right; running the query from T141160#2489124 did not find any; also, spot-checked users look okay (and the loginwiki account creation timestamp matches with the others). Still would be nice to prevent the errors; filed T149356.