Page MenuHomePhabricator

[IP Masking] Create temporary account on first edit
Open, Needs TriagePublic

Description

User Story

As an anonymous editor,
I want to be notified a temporary account will be created on my behalf
So that I do not have to reveal my personal information to edit a Wiki

Acceptance Criteria

  • Show an updated notice when the editor starts editing

image.png (1×2 px, 531 KB)
*See Figma for the latest update mockup. Mocks are WIP, collecting feedback

  • Notice displayed for both Wikitext & Visual Editor

image.png (1×1 px, 975 KB)
*See Figma for the latest update mockup. Mocks are WIP, collecting feedback

  • After the edit is saved, anonymous user should view notice for temporary account username
  • Temporary accounts should have “Preferences” disabled
  • Temporary accounts should be cross-wiki compatible
  • Temporary accounts should enable cross-wiki notifications

image.png (1×1 px, 576 KB)
*See Figma for the latest update mockup. Mocks are WIP, collecting feedback

Some answered questions

  1. Are we able to prevent anyone from signing up with the temporary account format of "~ ... ~"? Yes
  2. How many usernames currently contain "~" or "~...~"? 673 accounts that begin and end with a tilde. (Begins with tilde 1078).
  3. What will happen to usernames with this format? We can change them on their behalf. Captured here > T300265: [IP Masking] Inform username accounts pre/post-fixed with "*"

Proposed internal interface

// Skin
$tum = MediaWikiServices::getInstance()->getTempUserManager();
if ( $tum->isAutoCreateAction( 'edit' ) ) {
    // Show edit tab
} else {
    // Show view source tab
}

// Changes lists
if ( $tum->isTempUser( $user ) ) {
    $classes[] = 'temp-user';
}

// Permissions
public function getUserImplicitGroups( ... ) {
    ...
    if ( !$tum->isTempUser( $user ) ) {
        $groups[] = 'permanent'; //?
        $groups[] = 'nontemp'; //?
        $groups[] = 'onymous'; //?
    }
}

// Page save
// Before edit constraints
if ( $tum->isAutoCreateAllowed( 'edit', $anonUser ) ) {
    // notify constraints that creation will be attempted
} elseif ( !$anonUser->isAllowed( 'edit' ) ) {
    // not allowed
    return Status::newFatal( ... );
}

// Before PageUpdater
$status = $tum->createUser();
$user = $status->getUser();

// After save, CentralAuth will need to redirect to login.wikimedia.org for cookie-setting

Config

// Defaults
$wgAutoCreateTempUser = [
    'enabled' => false,
    'actions' => [ 'edit' ]
    'genPattern' => '*Unregistered $1*'
    'matchPattern' => '*$1',
    'serialProvider' => [ 'type' => 'local' ],
    'serialMap' => [ 'type' => 'plain-numeric' ],
];
$wgSharedTables[] = 'user_autocreate_serial';

// Typical wiki config
$wgGroupPermissions['*']['edit'] = false;
$wgGroupPermissions['user']['edit'] = true;
$wgGroupPermissions['*']['createaccount'] = true; // otherwise no actions are enabled

// Production
$wgAutoCreateTempUser['enabled'] = true;
$wgAutoCreateTempUser['serialProvider'] = 'centralauth';

Database

CREATE TABLE user_autocreate_serial (
   uas_shard INT PRIMARY KEY NOT NULL,
   uas_value INT NOT NULL
);

-- SQL flavoured pseudocode -- in reality this would be done in PHP
BEGIN;
SET n = 8;
SET r = FLOOR(RAND() * @n)
UPDATE user_autocreate_serial SET uas_value=uas_value+1 WHERE uas_shard=@r;
SELECT uas_value * @n + @r AS value FROM user_autocreate_serial WHERE uas_shard=@r;
COMMIT;

With n=1, r=0, IDs grow monotonically. With n>1 the IDs are not allocated in order, but a global lock is avoided.

Probably no need for altering the user table if we reserve a username prefix. We can just use the prefix for permissions and display.

Event Timeline

Task description edit: UserAutoCreator -> TempUserManager and some consequent terminology changes. The service has both read and write, it doesn't just create users.

I'm thinking about integration with PermissionManager and UserGroupManager. I think we should be fairly sure a save will complete before the user is created, and ideally before serial number acquisition. So if a user is blocked by IP, they will not be able to create accounts or allocate serial numbers. That means creating the account after the edit constraints pass. In the current interface sketch, I assumed that PermissionManager will be bypassed to show the edit UI, but PermissionManager does have a lot of useful logic that we don't want to have to duplicate.

I'm proposing that temp account creation will work with $wgGroupPermissions['*']['edit'] = false. The point of this is for the config to indicate that editing as a traditional IP-named anon is disallowed. It's not a hard requirement but it seems correct. This means that anonymous users with an intent to automatically create an account effectively have extra rights.

The problem is that with this change, UserIdentity does not have enough information to determine the rights of a user.

One possibility is to add a create-intent flag to the User class. Then widen the interface accepted by PermissionManager to allow the flag to be accessed.

interface UserIdentityForPermissions extends UserIdentity {
    public function getCreateIntent();
}

This would allow PermissionManager's hooks (UserCan etc.) to support create intention without a signature change.

Alternatives:

  • Put the flag in UserIdentity instead of introducing a new interface.
  • Make the flag be a parameter to the relevant PermissionManager methods and change the hook signatures as required. I haven't reviewed the hook handlers to decide which ones really need it. But Authority implementations would need to store the flag to pass through to PermissionManager, which implies that User is going to need it anyway since it implements Authority. For example ChangeTags::canAddTagsAccompanyingChange() is run from an edit constraint and needs to know whether an Authority has the applychangetags right, which depends on create intent.
  • Make a fake named user. This would be a pre-create user with a fake name (before serial allocation) with getId() returning some fake non-zero value to fool PermissionManager into giving it post-create permissions.
  • Follow the original idea, hack it up at the UI level. Have TempUserManager duplicate PermissionManager as required.
  • Treat the initial edit of an autocreated user as anonymous for permissions purposes. Require $wgGroupPermissions['*']['edit'] = true. Maybe a bit awkward if we want to extend the system to upload, which is traditionally denied to anons for DB schema reasons.

I think we should be fairly sure a save will complete before the user is created, and ideally before serial number acquisition.

What exactly does it mean for a user to be "created"? Specifically, does it mean an entry in the actor table, or an entry in the user table? Does it imply the ability to set preferences? Log in?

Also, I'd recommend against using a serial number. The use of serial numbers from an auto-increment field for Q-IDs on Wikidata has proven a never ending source of trouble. I'd propose a random string (could be a UUID, or a combination of four words picked from a long word list, etc).

I'm proposing that temp account creation will work with $wgGroupPermissions['*']['edit'] = false. The point of this is for the config to indicate that editing as a traditional IP-named anon is disallowed. It's not a hard requirement but it seems correct. This means that anonymous users with an intent to automatically create an account effectively have extra rights.

That seems confusing to me... if "IP masking" was enabled on a write-protected wiki, would that make the wiki no longer write-protected?

The problem is that with this change, UserIdentity does not have enough information to determine the rights of a user.

We already have this problem with "remote" users. I have been thinking that UserIdentity should expose a user "type". By itself, it can only distinguish between registered and non-registered, based on whether the user ID is 0. But more specific types could be passed to the constructor, like remote (imported), or, for your use case, "preliminary".

The user "type" would act as a generalization of the isRegistered() method, and could be used to imply group membership (as isRegistered implied the "user" group).

Ideally, this type field would also be present in the actor table. But we can also follow the pattern we are already using and rely on the name's syntax to determine its type. The distinction will have to be made by the code that constructs the UserIdentity though, it cannot be baked into the value object itself.

  • Make the flag be a parameter to the relevant PermissionManager methods and change the hook signatures as required. I haven't reviewed the hook handlers to decide which ones really need it. But Authority implementations would need to store the flag to pass through to PermissionManager, which implies that User is going to need it anyway since it implements Authority. For example ChangeTags::canAddTagsAccompanyingChange() is run from an edit constraint and needs to know whether an Authority has the applychangetags right, which depends on create intent.

Authority could encapsulate the distinction, but since most code that should in theory live in authority is still in PermissionManager, this flag would hawe to be passed around a lot. We could use this opportunity to refactor this logic and move it out of PermissionManager. But if UserIdentity exposes a user type, this may not be needed.

  • Make a fake named user. This would be a pre-create user with a fake name (before serial allocation) with getId() returning some fake non-zero value to fool PermissionManager into giving it post-create permissions.

That seems like asking for trouble. I'd much prefer a getType() method on UserIdentity.

  • Follow the original idea, hack it up at the UI level. Have TempUserManager duplicate PermissionManager as required.

Might be an option if most logic is pulled out of PermissionManager to reduce duplication. It's already structured as a collection of callbacks, these could be moved to separate classes.

But I'd still prefer if we could avoid needing a TempUserManager.

  • Treat the initial edit of an autocreated user as anonymous for permissions purposes. Require $wgGroupPermissions['*']['edit'] = true. Maybe a bit awkward if we want to extend the system to upload, which is traditionally denied to anons for DB schema reasons.

In my mind, "aspirational" users are just anons. Why should they have more rights? Am I missing some use case?
In any case, I'd guess we probably don't want to allow anons to upload.

What exactly does it mean for a user to be "created"? Specifically, does it mean an entry in the actor table, or an entry in the user table?

Both. Also globaluser, per the task description requirement for accounts to be "cross-wiki compatible".

Does it imply the ability to set preferences? Log in?

No you can't set preferences, per the task description. That's a product requirement, not a technical requirement. You can log in and create an account after temp account creation, per the mockup in the task description.

Also, I'd recommend against using a serial number. The use of serial numbers from an auto-increment field for Q-IDs on Wikidata has proven a never ending source of trouble. I'd propose a random string (could be a UUID, or a combination of four words picked from a long word list, etc).

My proposal in the task description will provide IDs that are of minimal length with scalable allocation performance. Amir thought it looked fine. My WIP patch (not uploaded yet) has pluggable serial providers, so you can plug in random UUIDs if you like, and it has pluggable integer to string maps, so you can plug in word lists if you like. The current proposal for production, by Prateek and Niharika, is to use short hexadecimal numbers.

if "IP masking" was enabled on a write-protected wiki, would that make the wiki no longer write-protected?

Per the task description, temp account creation will be disabled unless createaccount is granted to all. The terminology "IP masking" is not used in the proposed config. In the WIP patch, the system is enabled with $wgAutoCreateTempUser['enabled'] = true.

But I'd still prefer if we could avoid needing a TempUserManager.

TempUserManager interprets the config and is responsible for creating temporary users. I think it's still needed even if UserIdentity has a user type.

  • Treat the initial edit of an autocreated user as anonymous for permissions purposes. Require $wgGroupPermissions['*']['edit'] = true. Maybe a bit awkward if we want to extend the system to upload, which is traditionally denied to anons for DB schema reasons.

In my mind, "aspirational" users are just anons. Why should they have more rights? Am I missing some use case?
In any case, I'd guess we probably don't want to allow anons to upload.

I don't think you are missing any use case. As a matter of internal terminology, "anon" means a user with user_id=0 and named after their IP address. When an anon submits the edit form, halfway through the save process, they stop being an anon. Thus they get more rights. This is just the conceptual structure of the permissions system, it's not a product requirement.

For skin edit tabs and the like, it would be convenient if anon rights and temp user rights were approximately the same. But write actions are not authorized unless account creation is successful, so technically, anons do not have the same rights as temp users. Note that temporary account creation is action-dependent, and is currently only proposed for editing, so rights granted to temp users for other actions should not automatically be applied to anons.

Task description edit: use $wgAutoCreateTempUser['enabled'] = true instead of $wgAutoCreateTempUserEnabled = true.

TempUserManager interprets the config and is responsible for creating temporary users. I think it's still needed even if UserIdentity has a user type.

TempUserManager is being split up due to dependency loops so there might not be much left of it by the end of the day.

Change 767617 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Automatic creation of temporary user accounts

https://gerrit.wikimedia.org/r/767617

My patch has UserIdentity::getType() return UserIdentity::TYPE_TEMP if the user is a temporary user. That's convenient for callers and eliminates the dependency on TempUserConfig in a lot of places. But UserIdentityValue is a real problem. A lot of callers do things like:

$groups = $permissionManager->getUserGroups( new UserIdentityValue( (int)$row->user_id, $row->user_name ) )

The result should be different depending on whether $row->user_name starts with an asterisk. UserIdentityValue has a stable constructor, there is no way to inject the config into it.

The options are:

  • Remove UserIdentity::TYPE_TEMP, go back to injecting TempUserConfig into everything that needs to figure out whether a UserIdentity is a temp user.
  • Break the UserIdentityValue constructor, require a type parameter. Along the lines of Daniel's actor_type field idea.
  • Use global state.
  • Make the asterisk prefix be fixed and unconfigurable, like the ">" separator for imported usernames. UserIdentityValue::__construct() could then have:
if ( ( $name[0] ?? '' ) === '*' ) {
    $this->type = UserIdentity::TYPE_TEMP;
} elseif ( strpos( $name, '>' ) ) {
    $this->type = UserIdentity::TYPE_IMPORTED;
} elseif ( !$id ) {
    $this->type = UserIdentity::TYPE_UNREGISTERED;
} else {
    $this->type = UserIdentity::TYPE_NORMAL;
}

Make the asterisk prefix be fixed and unconfigurable

This doesn't actually help. You still need at least need $wgAutoCreateTempUser['enabled'] in the UserIdentityValue constructor in order to prevent the patch from breaking all users starting with "*" when it is merged.

PS2 will have option 1, remove TYPE_TEMP. User::isTemp() is using global state, but that's allowed, whereas in UserIdentityValue it would be beyond the pale.

Change 767917 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] In wfArrayInsertAfter(), skip insert on missing key

https://gerrit.wikimedia.org/r/767917

Change 767918 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/skins/Vector@master] Updates for core temp user autocreation feature

https://gerrit.wikimedia.org/r/767918

PS2 will have option 1, remove TYPE_TEMP. User::isTemp() is using global state, but that's allowed, whereas in UserIdentityValue it would be beyond the pale.

Since UserIdentityValue is a value object, it would be up to the code that constructs the object to determine the type. The type would be passed to the constructor. The only complication is keeping the constructor's signature backwards compatible, since UserIdentityValue is newable.

Since UserIdentityValue is a value object, it would be up to the code that constructs the object to determine the type. The type would be passed to the constructor. The only complication is keeping the constructor's signature backwards compatible, since UserIdentityValue is newable.

It's not practical for every caller to duplicate determination of the type, there would need to be a factory. And if you have a factory, there's no point in making it newable. A backwards compatible constructor is just not possible without global state access in the constructor.

It's not practical for every caller to duplicate determination of the type, there would need to be a factory. And if you have a factory, there's no point in making it newable. A backwards compatible constructor is just not possible without global state access in the constructor.

If I understand correctly, the alternative is requiring the use of User, which we want to phase out. And relying on global state inside User, we we also want to avoid.
We could have a service that takes a UserIdentity and determines the type, perhaps with internal caching. That would be clean, but rather inconvenient.

Relying on the User class will work for now, since we are still using it in most places where a UserIdentity is required. But the intent is to change that. And by relying on global state to determine the value returned by getType() it seems we are painting ourselves into a corner. Also, if we make getType() "smart" on user and "dumb" on UserIdentityValue, the method becomes unreliable. We depend on the code that constructs the object to make the correct choice. Which amounts to the same as a constructor parameter.

I think we can change the constructor of UserIdentityValue. Looking at code that constructs UserIdentityValue, it seems to me that it is newable mostly for the benefit of tests (hundreds of callers). Outside tests, If find 8 calls in extensions, and about 20 in core. Half of these have user ID 0 hard coded.

The canonical source for UserIdentity instances already is ActorStore (and, for now, User itself). Code that directly constructs UserIdentityValues with non-zero IDs could be changed to using ActorStore as a factory (though with an eye on performance).

I think the following might work:

  • add an optional type parameter to the constructor of UserIdentityValue. Define type constants in UserIdentity.
  • The default for the type is UNREGISTERED if the user ID is 0, and REGISTERED otherwise.
  • add a hasType( $type ) method to UserIdentity. Could be a simple equality check, but maybe we want to allow subtypes based on a bit mask.
  • Other types already used in core would include SYSTEM and REMOTE. You would be adding TEMPORARY or PRELIMINARY.

We'd have to take a careful look at potential issued caused by a UserIdentity defaulting to REGISTERED in a place where it should be PRELIMINARY. There should perhaps also be a warning on the constructor about calling it directly outside tests. But it seems manageable.

We could alternatively use a subclass or an alternative implementation of the UserIdentity interface, but it all amounts to the same thing: the caller of the constructor has to know the type, so the consumer of getType() can be sure to get the correct type.

I can have a closer look and experiment with this next week.

Change 767917 merged by jenkins-bot:

[mediawiki/core@master] In wfArrayInsertAfter(), skip insert on missing key

https://gerrit.wikimedia.org/r/767917

@tstarling one thing that this task doesn't explicitly mention but I wanted to call it out is that we do want to maintain a distinction between the temporary user accounts and permanent accounts in order to allow features like RecentChanges filters that rely on that distinction and AbuseFilters can continue running.

Change 774568 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/skins/Vector@master] Prepare Vector for temporary user accounts

https://gerrit.wikimedia.org/r/774568

Change 774568 abandoned by Jdlrobson:

[mediawiki/skins/Vector@master] Prepare Vector for temporary user accounts

Reason:

https://gerrit.wikimedia.org/r/774568

Change 774568 restored by Jdlrobson:

[mediawiki/skins/Vector@master] Prepare Vector for temporary user accounts

https://gerrit.wikimedia.org/r/774568

Change 774568 merged by jenkins-bot:

[mediawiki/skins/Vector@master] Prepare Vector for temporary user accounts

https://gerrit.wikimedia.org/r/774568

Change 767617 merged by jenkins-bot:

[mediawiki/core@master] TempUser infrastructure and services

https://gerrit.wikimedia.org/r/767617

Change 767918 merged by jenkins-bot:

[mediawiki/skins/Vector@master] Updates for core temp user autocreation feature

https://gerrit.wikimedia.org/r/767918

One thing to consider: if temporary accounts does not connect with CentralAuth, nane conflicts should be prevented if cross-wiki features are uses (e.g. Special:Import, FileImporter, Wikidata RC/Watchlist integration).

One thing to consider: if temporary accounts does not connect with CentralAuth, nane conflicts should be prevented if cross-wiki features are uses (e.g. Special:Import, FileImporter, Wikidata RC/Watchlist integration).

Temporary accounts are attached to CentralAuth.