Page MenuHomePhabricator

[IP Masking] Maintenance script to rename users matching configured TempUser pattern
Closed, ResolvedPublic

Description

We need a maintenance script to rename all users matching $wgAutoCreateTempUser['matchPattern'], globally including in CentralAuth.

For the benefit of Wikimedia's non-CentralAuth wikis, and for third-party users, it should be possible to perform this operation without CentralAuth integration.

Note that there is also T300265 for the notification component of the migration.

Event Timeline

@tstarling Hi! Is this task something your team will be interested in working on? If so, do you have a plan for informing the community and a timeline in mind?

@tstarling Hi! Is this task something your team will be interested in working on? If so, do you have a plan for informing the community and a timeline in mind?

I plan to write the scripts, this week. I'm not volunteering to manage the migration process at this stage.

Plan:

  • Split reserved pattern and match pattern concepts. Allow a name pattern to be reserved prior to full IP masking deployment. User creation will be disallowed but login allowed.
  • Write a core script specifically for IP masking preparation, which renames users matching either pattern. Skip CentralAuth attached users.
    • Maybe do T27482 so that existing code can be used. Currently CentralAuth depends on Renameuser. The dependency is not declared in extension.json but an exception is thrown from LocalRenameUserJob if Renameuser is not installed. This is an inelegant situation. Core's uppercaseTitlesForUnicodeTransition.php needed user renames for PHP version migration but was forced to just dump the names to a file due to the lack of user renaming support in core.
  • Write a CentralAuth script which renames users matching the pattern. It would be similar to WikimediaMaintenance renameInvalidUsernames.php.

Change 894749 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Add renameUsersMatchingPattern.php

https://gerrit.wikimedia.org/r/894749

Change 896223 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/CentralAuth@master] CentralAuth: Add renameUsersMatchingPattern.php

https://gerrit.wikimedia.org/r/896223

Change 894749 merged by jenkins-bot:

[mediawiki/core@master] Add renameUsersMatchingPattern.php

https://gerrit.wikimedia.org/r/894749

Change 896223 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] CentralAuth: Add renameUsersMatchingPattern.php

https://gerrit.wikimedia.org/r/896223

Change 898443 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Allow a temp username pattern to be reserved without activating the system

https://gerrit.wikimedia.org/r/898443

Change 894749 merged by jenkins-bot:

[mediawiki/core@master] Add renameUsersMatchingPattern.php

https://gerrit.wikimedia.org/r/894749

I found a few edge cases in local testing that may be of interest.

My first instinct was to use --from '*$1' --to '$1' because picking a different symbol or a word like "Star" seemed like an outcome least likely to be desired by the account owners. They can request their own rename before or or after the script runs of course, but stripping it seems like a fairly neutral default if it avoids conflicts. However, this can still produce accounts that match the patterns given **Foo to *Foo.

Speaking of conflicts, I noticed with --from '*B$1r' --to '$1dmin' it renamed *Bar to admin and didn't seem to mind that Admin already exists..

Lastly, the log entries and pages created by the rename used a non-canonical lowercased user name. I don't know if that causes problems by itself, but it at least looked off in the logs to see "admin" mentioned and linked as a valid username in Special:Log.

Change 898443 merged by jenkins-bot:

[mediawiki/core@master] Allow a temp username pattern to be reserved without activating the system

https://gerrit.wikimedia.org/r/898443

Change 924173 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] renameUsersMatchingPattern.php: canonicalize and check for existence of target

https://gerrit.wikimedia.org/r/924173

Change 924175 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/CentralAuth@master] Run the GlobalRenameUserValidator when renaming users with the maintenance script

https://gerrit.wikimedia.org/r/924175

My first instinct was to use --from '*$1' --to '$1' because picking a different symbol or a word like "Star" seemed like an outcome least likely to be desired by the account owners. They can request their own rename before or or after the script runs of course, but stripping it seems like a fairly neutral default if it avoids conflicts. However, this can still produce accounts that match the patterns given **Foo to *Foo.

That is not a bug. There's a few ways you could deal with that without changing the script.

Speaking of conflicts, I noticed with --from '*B$1r' --to '$1dmin' it renamed *Bar to admin and didn't seem to mind that Admin already exists..

Lastly, the log entries and pages created by the rename used a non-canonical lowercased user name. I don't know if that causes problems by itself, but it at least looked off in the logs to see "admin" mentioned and linked as a valid username in Special:Log.

I fixed these two issues in the patches above.

The CentralAuth script already had canonicalization, but did not check if the target already existed.

Change 924175 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] Run the GlobalRenameUserValidator when renaming users with the maintenance script

https://gerrit.wikimedia.org/r/924175

Change 924173 merged by jenkins-bot:

[mediawiki/core@master] renameUsersMatchingPattern.php: canonicalize and check for existence of target

https://gerrit.wikimedia.org/r/924173