Page MenuHomePhabricator

Run deleteLocalPasswords.php in WMF prod (Central Auth wikis only!) after 1.32.0-wmf.16 is everywhere
Closed, ResolvedPublic

Description

We should run deleteLocalPasswords.php across all wikis when the script and dependancies are everywhere. For T57420

Event Timeline

Reedy renamed this task from Run deleteLocalPasswords.php in WMF prod after 1.32.0-wmf.16 is everywhere to Run deleteLocalPasswords.php in WMF prod (Central Auth wikis only!) after 1.32.0-wmf.16 is everywhere.Aug 2 2018, 1:54 PM
Reedy updated the task description. (Show Details)

According to http://tools.wmflabs.org/versions/ wmf.16 is already everywhere, and we'll be soon on wmf.18 (no wmf.17?).

Ladsgroup subscribed.

I will run this tomorrow.

It’s technically a WMF holiday tomorrow. Dunno which Europeans will be observing it. Considering it has a destructive action it may be best waiting

I didn't know tomorrow is a holiday, will run it the day after tomorrow.

The other question is against which dblists

CentralAuth wikis only. You could build a temporary centralauth.dblist computing:

%% all.dblist - private.dblist - fishbowl.dblist - nonglobal.dblist

(maybe even later expanding it; I've heard folks don't like computed dblists) and run the script on them.

Editted per T201009#4552203

Change 457459 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/mediawiki-config@master] Add centralauth.dblist

https://gerrit.wikimedia.org/r/457459

Mentioned in SAL (#wikimedia-operations) [2018-09-07T10:03:18Z] <Amir1> ladsgroup@mwmaint1001:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=fawiki --user Ladsgroup --prefix (T201009)

Change 458766 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/CentralAuth@master] Fix typo

https://gerrit.wikimedia.org/r/458766

Mentioned in SAL (#wikimedia-operations) [2018-09-07T10:03:18Z] <Amir1> ladsgroup@mwmaint1001:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=fawiki --user Ladsgroup --prefix (T201009)

ladsgroup@mwmaint1001:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=fawiki --user Ladsgroup --prefix
[Fri Sep  7 10:03:20 2018] [hphp] [21732:7f77dcc653c0:0:000001] [] 
Fatal error: require_once(/srv/mediawiki/php-1.32.0-wmf.20/maintenance/includes/deleteLocalPasswords.php): File not found in /srv/mediawiki/php-1.32.0-wmf.20/extensions/CentralAuth/maintenance/deleteLocalPasswords.php on line 7

Fix in the patch I just put up.

Change 458770 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/mediawiki-config@master] Add $wgPasswordConfig['null']

https://gerrit.wikimedia.org/r/458770

Change 458766 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Fix typo

https://gerrit.wikimedia.org/r/458766

Change 459504 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/CentralAuth@wmf/1.32.0-wmf.20] Fix typo

https://gerrit.wikimedia.org/r/459504

Change 458770 merged by jenkins-bot:
[operations/mediawiki-config@master] Add $wgPasswordConfig['null']

https://gerrit.wikimedia.org/r/458770

Mentioned in SAL (#wikimedia-operations) [2018-09-10T11:06:26Z] <ladsgroup@deploy1001> Synchronized wmf-config/CommonSettings.php: [[gerrit:458770|Add ['null'] (T201009)]] (duration: 00m 50s)

Change 459504 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@wmf/1.32.0-wmf.20] Fix typo

https://gerrit.wikimedia.org/r/459504

Mentioned in SAL (#wikimedia-operations) [2018-09-10T11:16:03Z] <ladsgroup@deploy1001> Synchronized php-1.32.0-wmf.20/extensions/CentralAuth/maintenance/deleteLocalPasswords.php: [[gerrit:459504|SWAT: Fix typo (T201009)]] (duration: 00m 50s)

The second try:

ladsgroup@mwmaint1001:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=fawiki --user Ladsgroup --prefix
Processing users for fawiki ...
	 ... querying 'Ladsgroup'
[Mon Sep 10 11:16:16 2018] [hphp] [26622:7f7f5f18b3c0:0:000001] [] 
Catchable fatal error: Argument 1 passed to DeleteLocalPasswords::processUsers() must be an instance of array, string given in /srv/mediawiki/php-1.32.0-wmf.20/maintenance/includes/DeleteLocalPasswords.php on line 99

Change 460002 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/core@master] Fix --user option in DeleteLocalPasswords

https://gerrit.wikimedia.org/r/460002

Change 460002 merged by jenkins-bot:
[mediawiki/core@master] Fix --user option in DeleteLocalPasswords

https://gerrit.wikimedia.org/r/460002

Mentioned in SAL (#wikimedia-releng) [2018-09-28T15:31:17Z] <Amir1> ladsgroup@deployment-deploy01:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=fawiki --prefix (T201009)

Tested it on beta cluster with both --prefix and --delete options. Going to do some small parts in prod

Mentioned in SAL (#wikimedia-operations) [2018-09-28T16:01:15Z] <Amir1> ladsgroup@mwmaint2001:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=fawiki --prefix (T201009)

Local passwords in mediawiki.org are deleted and in Persian Wikipedia got prefixed. I sent a note to the village pump of Persian Wikipedia asking them to let me know if they see any problem. I will delete everything in Monday if nothing arises.

I don't see that deleteLocalPasswords checks if there is an account on CentralAuth. Do all users have an account there with the password hash copied ? Or would an ancient user that never got migrated and suddenly decide to return not be able to login after the password deletion?

I don't see that deleteLocalPasswords checks if there is an account on CentralAuth. Do all users have an account there with the password hash copied ? Or would an ancient user that never got migrated and suddenly decide to return not be able to login after the password deletion?

After SUL, all users are now connected to CentralAuth (in CentralAuth wikis only of course)

Mentioned in SAL (#wikimedia-operations) [2018-10-01T10:10:38Z] <Amir1> mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=fawiki --delete (T201009)

Mentioned in SAL (#wikimedia-operations) [2018-10-01T10:15:26Z] <Amir1> ladsgroup@mwmaint2001:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --prefix on all CentralAuth wikis (T201009)

Change 463748 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/core@master] Add waitForReplication in DeleteLocalPasswords

https://gerrit.wikimedia.org/r/463748

Change 463748 merged by jenkins-bot:
[mediawiki/core@master] Add waitForReplication in DeleteLocalPasswords

https://gerrit.wikimedia.org/r/463748

Change 463784 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/core@wmf/1.32.0-wmf.23] Add waitForReplication in DeleteLocalPasswords

https://gerrit.wikimedia.org/r/463784

Change 463784 merged by jenkins-bot:
[mediawiki/core@wmf/1.32.0-wmf.23] Add waitForReplication in DeleteLocalPasswords

https://gerrit.wikimedia.org/r/463784

Mentioned in SAL (#wikimedia-operations) [2018-10-01T18:23:56Z] <catrope@deploy1001> Synchronized php-1.32.0-wmf.23/maintenance/includes/DeleteLocalPasswords.php: T201009 (duration: 00m 56s)

Mentioned in SAL (#wikimedia-operations) [2018-10-01T18:28:27Z] <Amir1> ladsgroup@mwmaint2001:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=enwiki --prefix (T201009)

With the new waitForReplication now it has around 3 seconds lag on s1:

image.png (236×1 px, 13 KB)

I stopped it though. @jcrespo What do you think?

All wikis now have it prefixed for a while now and no issues have been raised. I give it until after the switchover and then I start deleting them.

Looking at enwiki:

mysql:research@analytics-store.eqiad.wmnet [enwiki]> select type, count(*) from (select CASE WHEN user_password = '' THEN 'null' WHEN user_password LIKE ':null:%' THEN 'nullable' ELSE 'real' END type from user) x group by type;
+----------+----------+
| type     | count(*) |
+----------+----------+
| null     | 10947477 |
| nullable | 23721608 |
| real     |       30 |
+----------+----------+
3 rows in set (2 min 29.85 sec)

which seems a bit suspicious. Nullable means attached user who does not need a local password (and it will be deleted by the final pass of the script) - I would expect there to be way more of those. Null means those users already have no password - either they registered after loginOnly was enabled in January, or they changes their password since then, or the account was autocreated (I think? not quite sure how that would be handled), or they are system users. 30% of all users seems a bit too high for that. (Real means non-attached users with a password; those are the results of bugs and I would have expected one or two magnitudes more of them.)
So I think it's worth manually double-checking that the script does the right thing, before making irreversible changes-

Looking at a yearly split:

MariaDB [enwiki]> select type, year, count(*) from (select substr(user_registration, 1, 4) year, CASE WHEN user_password = '' THEN 'null' WHEN user_password LIKE ':null:%' THEN 'nullable' ELSE 'real' END type from user) x group by type, year order by type, year;
+----------+------+----------+
| type     | year | count(*) |
+----------+------+----------+
| null     | NULL |      875 |
| null     | 2001 |       11 |
| null     | 2002 |       40 |
| null     | 2003 |      167 |
| null     | 2004 |      707 |
| null     | 2005 |     1831 |
| null     | 2006 |     6441 |
| null     | 2007 |    12808 |
| null     | 2008 |   163013 |
| null     | 2009 |   467386 |
| null     | 2010 |   468152 |
| null     | 2011 |   431266 |
| null     | 2012 |   555270 |
| null     | 2013 |   560576 |
| null     | 2014 |   751923 |
| null     | 2015 |   959804 |
| null     | 2016 |  1833713 |
| null     | 2017 |  2667320 |
| null     | 2018 |  2084388 |
| nullable | NULL |   489890 |
| nullable | 2001 |      194 |
| nullable | 2002 |     1859 |
| nullable | 2003 |     8523 |
| nullable | 2004 |    40387 |
| nullable | 2005 |   181961 |
| nullable | 2006 |  2402061 |
| nullable | 2007 |  2942642 |
| nullable | 2008 |  2364086 |
| nullable | 2009 |  2243067 |
| nullable | 2010 |  1902052 |
| nullable | 2011 |  1867750 |
| nullable | 2012 |  1609565 |
| nullable | 2013 |  1721041 |
| nullable | 2014 |  2415410 |
| nullable | 2015 |  2569495 |
| nullable | 2016 |   920114 |
| nullable | 2017 |    36776 |
| nullable | 2018 |     4269 |
| real     | NULL |        1 |
| real     | 2004 |        1 |
| real     | 2005 |        2 |
| real     | 2006 |        2 |
| real     | 2007 |        1 |
| real     | 2010 |        1 |
| real     | 2015 |       11 |
| real     | 2016 |        7 |
| real     | 2017 |        4 |
+----------+------+----------+
47 rows in set (3 min 16.67 sec)

it seems in the old times some 25% of accounts did not have passwords (autocreations, maybe?), after the introduction of AuthManager that went down to 1% (no password by default but password changes restored it?) and to zero after enabling loginOnly mode in March 2018. So I guess that does not sound that insane, although it does not match my understanding of how things should work (autocreation should create a localpassword without loginOnly enabled).

Last year:

MariaDB [enwiki]> select type, month, count(*) from (select substr(user_registration, 5, 2) month, CASE WHEN user_password = '' THEN 'null' WHEN user_password LIKE ':null:%' THEN 'nullable' ELSE 'real' END type from user where user_registration like '2018%') x group by type, month order by type, month;
+----------+-------+----------+
| type     | month | count(*) |
+----------+-------+----------+
| null     | 01    |   229575 |
| null     | 02    |   209061 |
| null     | 03    |   228393 |
| null     | 04    |   222129 |
| null     | 05    |   236213 |
| null     | 06    |   208321 |
| null     | 07    |   213515 |
| null     | 08    |   220190 |
| null     | 09    |   236030 |
| null     | 10    |    86839 |
| nullable | 01    |     2567 |
| nullable | 02    |     1439 |
| nullable | 03    |      263 |
+----------+-------+----------+
13 rows in set (3 min 30.78 sec)

so loginOnly worked reliably, that's good to know.

ariaDB [enwiki]> select type, centralized, attached, count(*) from (select CASE WHEN user_password = '' THEN 'null' WHEN user_password LIKE ':null:%' THEN 'nullable' ELSE 'real' END type, gu_id IS NOT NULL centralized, lu_name IS NOT NULL attached from user left join centralauth.globaluser on gu_name = user_name left join centralauth.localuser on lu_name = gu_name and lu_wiki = 'enwiki') x group by type, centralized, attached order by type, centralized, attached;
+----------+-------------+----------+----------+
| type     | centralized | attached | count(*) |
+----------+-------------+----------+----------+
| null     |           0 |        0 |     2599 |
| null     |           1 |        0 |       26 |
| null     |           1 |        1 | 10977004 |
| nullable |           0 |        0 |        7 |
| nullable |           1 |        1 | 23720799 |
| real     |           0 |        0 |       19 |
| real     |           1 |        0 |       11 |
+----------+-------------+----------+----------+
7 rows in set (1 hour 4 min 33.52 sec)

So that looks sane after all (I messed up some former queries not recorded here and got confused). A few thousand unattached accounts with no password, probably system users. Lots of attached accounts with no password or marked for password removeal, that's OK. Seven unattached accounts marked for password removal - that should not happen but the number is small enough not to care (also, all of them are barely used).

So this should be good to go, sorry for the holdup.

Change 457459 abandoned by Ladsgroup:
Add centralauth.dblist

Reason:
Not needed anymore

https://gerrit.wikimedia.org/r/457459

Ladsgroup moved this task from Backlog / Other to Other WMF team on the acl*security board.

The script has been ran everywhere by yours truly.