Page MenuHomePhabricator

Ensure that an IP address cannot be saved permanently if IP Masking is enabled
Closed, ResolvedPublic

Description

Background

One goal of the IP Masking project is that wikis with temporary account autocreation enabled should not store a user's IP address anywhere other than (temporarily) in the CheckUser tables.

Currently IP addresses are stored in the actor table's actor_name column whenever an anonymous actor is created. New actors are created in various places, for example when saving a recent change or making a log entry: https://codesearch.wmcloud.org/deployed/?q=acquireActorId&files=&excludeFiles=&repos=

Eventually temporary accounts will be autocreated for all actions that cause an IP actor to be created. However, there are a lot of existing code paths to be updated, and we should ensure that any new features that attempt to save an IP actor fail in their attempt. We can ensure that IP address names do not pass validation checks for insertion into the actor table.

To do

ActorStore::validateActorForInsertion should fail if the user passed to that method has an IP address for a username.

Related Objects

Event Timeline

Tchanders renamed this task from Ensure that an IP address cannot be saved to the actor table if AutoCreateTempUser['enabled'] is true to Ensure that an IP address cannot be saved permanently if IP Masking is enabled.Sep 4 2023, 3:43 PM
Tchanders updated the task description. (Show Details)

Note that ManualLogEntry::insert stores an actor ID in logging.log_actor (and calls ActorStore::acquireActorId in order to get the ID).

The good news: making this change should mean that a log entry can't be saved with an IP user as the performer, so a user's IP address should no longer be displayed in a log entry for some action that they performed.

Possible pitfall: if an action is performed, then logged, and only when logging is the actor created (and the action is erroneously assigned to an IP user instead of a temporary user), then throwing this error could cause an action to be performed but not logged.

We talked about this and the answer is yes! The sooner we get this change in, the better. We can track down places where the error pops up on beta wikis so we can start ensuring the code handles it gracefully, where possible.

Change 965773 had a related patch set uploaded (by Tchanders; author: Tchanders):

[mediawiki/core@master] Ensure an IP actor cannot be created if temporary accounts are enabled

https://gerrit.wikimedia.org/r/965773

Note that ManualLogEntry::insert stores an actor ID in logging.log_actor (and calls ActorStore::acquireActorId in order to get the ID).

The good news: making this change should mean that a log entry can't be saved with an IP user as the performer, so a user's IP address should no longer be displayed in a log entry for some action that they performed.

Possible pitfall: if an action is performed, then logged, and only when logging is the actor created (and the action is erroneously assigned to an IP user instead of a temporary user), then throwing this error could cause an action to be performed but not logged.

Or, more generally, if some action can be performed before the actor performing that action is created, then we would have a problem: the action would have taken place, but wouldn't be assigned to anyone.

I can't think of any way this could be possible. Actions performed by actors update database tables that record actor IDs, so if an actor couldn't be created, the process would error before the action could be completed...

@matmarex expressed concern that this may not work for older extensions that haven't been updated to create actors, like Flow. I had a look into this (indeed it doesn't work).

What does Flow do?

Replying to a comment on a Flow board (as a first-time IP editor), posts a request along the lines of:

action: flow
format: json
uselang: en-gb
assert: anon
submodule: reply
page: Topic:xsdctfu4pfv1cscm
repreplyTo: xsdctfu4pfv1cscm
repcontent: Hey there
repformat: wikitext
token: +\

What happens if we throw this error?

No actor is created. However, the Flow comment is saved successfully, the IP address is saved in flow_revision.rev_user_ip, and displays in the UI:

image.png (102×917 px, 9 KB)

How is the actor created?

Logging the call stack from Database::insert:

public function insert( $table, $rows, $fname = __METHOD__, $options = [] ) {
+       if ( $table === 'actor' ) {
+               $data = debug_backtrace();
+               foreach ( $data as $datum ) {
+                       wfDebugLog( 'flow', $datum['class'] . $datum['type'] . $datum['function'] );
+               }
+       }
...

...shows that the actor is created via core, unknown to Flow:

Wikimedia\Rdbms\Database->insert
Wikimedia\Rdbms\DBConnRef->__call
Wikimedia\Rdbms\DBConnRef->insert
Wikimedia\Rdbms\InsertQueryBuilder->execute
MediaWiki\User\ActorStore->acquireActorId
RecentChange->save
Flow\Data\Listener\RecentChangesListener->onAfterInsert
Flow\Data\Listener\DeferredInsertLifecycleHandler::Flow\Data\Listener\{closure}
MWCallableUpdate->doUpdate
DeferredUpdates::attemptUpdate
DeferredUpdates::run
DeferredUpdates::{closure}
DeferredUpdatesScope->processStageQueue
DeferredUpdatesScope->processUpdates
DeferredUpdates::doUpdates
MediaWiki->restInPeace
MediaWiki->doPostOutputShutdown
wfApiMain

What should we do?

About Flow:

About other old features:

There are plenty of actions which don't capture the actor in tables other than logging, e.g. user account creation, page delete, user rights changes, abuse filter changes... maybe just create a special kind of actor (such as a "system error" actor) that doesn't expose any information?

Change 965773 merged by jenkins-bot:

[mediawiki/core@master] Ensure an IP actor cannot be created if temporary accounts are enabled

https://gerrit.wikimedia.org/r/965773

dom_walden subscribed.

I have tried to perform some actions as a logged out user on wikis with temporary accounts enabled, to see if I can see the exception.

I only saw the exception for actions like blocking and page deletes, and only if I changed the user rights to allow IP users to perform such actions, which isn't an advisable configuration.

For actions like edit and account creation, I can only see the temporary user being stored in the actor table, not the IP. The logging, recentchanges and various CheckUser tables record the actor ID of the newly created temporary users.

I notice RenameuserSQL writes to the actor table directly. I tried via Special:RenameUser to rename a username to an IP, but I was blocked with The username "<ip>" is invalid.

To see if there have been any attempts to save an IP to the actor table on dewiki beta, I searched beta's logstash for the string "Creating an IP actor with temporary accounts enabled" but could not find any for the last 6 months. Either it has not happened or beta's logstash is not working properly.

One scenario I didn't test was doing an actor migration on a wiki with temporary accounts enabled. I don't know how much of a risk this is in the wild.

Test environments:

@Tchanders is there a way to override this and not throw an exception if this is the intended behaviour?

I ask this because in CheckUser failed login attempts are assigned as being performed by the IP address that made the login attempt. This won't work because an actor ID will be assigned and this will fail.

Changing the actor to be a temporary account will not work as we don't want to create a temporary account for a user on a failed login attempt to a registered account.

@Dreamy_Jazz We're working on this for T350155, to allow IP actor creation when importing, so the revision can be assigned to an IP actor.

The solution in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/979991 introduces a way to get an actor store that allows IP actors (though the naming currently implies that it's only needed for import).

Previously we considered accepting a flag in ActorStore::acquireActorId, but passing a flag everywhere was clunky and involved updating a lot of places that shouldn't care about it.

Would it be possible to make a task for the CheckUser issue? I'd like to try not to create IP actors where possible, and legally we presumably shouldn't be storing the IP address permanently. I'm wondering if we could assign some kind of non-IP name (maybe like suggested in T345578#9293633) and let CheckUser expose the IP address the same way it does for logged in users.

@Dreamy_Jazz We're working on this for T350155, to allow IP actor creation when importing, so the revision can be assigned to an IP actor.

Thanks!

Would it be possible to make a task for the CheckUser issue? I'd like to try not to create IP actors where possible, and legally we presumably shouldn't be storing the IP address permanently. I'm wondering if we could assign some kind of non-IP name (maybe like suggested in T345578#9293633) and let CheckUser expose the IP address the same way it does for logged in users.

I've filed T353953: Don't use actor IDs for private CheckUser events when these actions are performed by IP addresses.