Page MenuHomePhabricator

The CheckUser extension should provide IP addresses of temporary account users, for IP Masking
Closed, ResolvedPublic8 Estimated Story Points

Description

Background

Following T300263: [IP Masking] Create temporary account on first edit, IP addresses for temporary accounts will only be stored in the CheckUser tables. This is by design, to protect user privacy.

However, IP addresses form an important part of anti-vandalism workflows, and need to be made visible to trusted users for these purposes.

IP addresses used by temporary account users will be visible to some trusted users who are not CheckUsers (to be defined, but see T300289: [IP Masking] IPViewer roles views temporary account users for example).

What needs to be done

We need CheckUser to be able to return, for users with a permission other than checkuser (name of new permission tbc):

This should be done carefully and might be a new endpoint, to avoid the possibility of revealing sensitive information to users without the checkuser permission.

Notes

The cu_changes table doesn't guarantee that a particular combination of username/revId or username/timestamp will be unique, so there could in theory be multiple IP addresses associated with these. In that case, it won't be clear which one should be shown in a history line, log line, etc. For now we will just pick one, perhaps the most recent.

This work will be affected by T324907: Create separate tables for log events in CheckUser. After that task:

  • we can look up a log entry by log ID in the cu_log_event table
  • when looking for all IPs, we'll need to look in all the relevant tables
Testing notes
  • To use this API, you'll need the checkuser-temporary-account right that's introduced in this patch
  • The URL is rest.php/checkuser/v0/temporaryaccount/{name}, where {name} is the user name you want to look up (see RestRoutes in extension.json)
  • You can add the optional params in TemporaryAccountHandler::getParamSettings via the querystring (e.g. ?revision=123, etc)
  • In case you have been switching $wgAutoCreateTempUser['enabled'] on and off locally, it needs to be true to avoid a nonexistent account error

Related Objects

Event Timeline

Using the checkuser table for this purpose may require a database change. Using a new table for this purpose could be a solution which would help to seperate data, but once an account is converted that data would need to be copied between tables which is not ideal.

Would the checkuser ''right'' explicitly grant access to this information or must a checkuser have this new right? Example of the former is that a user with only the checkuser right can still get IPs for a temporary user.

A new page will almost certainly be needed as the existing special pages rely a lot on having the checkuser right. For example, Investigate with only being able to access IPs for temporary accounts would need design changes. Special:CheckUser's get IPs allows this to work, but still a lot of change would be needed.

IMO Special:CheckUser and Special:Investigate should display the information (depending on rights) about temporary accounts as things like the User agent string and XFF headers may not be given out in this new interface. Plus it makes linking logged out users and accounts harder. Having the checkuser right not give the information about the temporary accounts seems strange (as this removes access to information currently given to CUs) and would also make this change complicated.

Looking at the mock ups in some of the linked tasks this will likely require either some interesting use of the CheckUser API or a different API designed to return the data in a more efficent manner. For example, to return the IP used in the mockup in T300289 it seems that an API call would be made to request the IP address for that specific edit. Doing that is possible using the checkuser API, but it's not how it's designed to be used. Requesting one IP would require limiting the request to one row and careful selection of the limit. You couldn't use 'get edits' as that would expose the user agent string and XFF, and so would have to use 'get ip addresses'.

Furthermore, it seems that the access log for this is supposed to be accessible to more than just checkusers depending on wiki configuaration. As such, using the checkuser API directly would cause log entries to Special:CheckUserLog. Writing code to determine where a log entry needs to go will likely be difficult if using the same API endpoint.

As such, I'm thinking a new API endpoint and Special page that could return an IP associated with a edit or all IPs associated with a temporary user would be better. This would avoid mixing code that has requirements for different rights. The data could still be stored in the checkuser cu_changes table, and I think based on some reading this would require little changes to existing code.

The API could be a list named "ipviewer" that can take a temporary account username for all IPs used or specific oldid for the IP used for that one edit. The special page could be named similar and would have similar options to the API (username for all IPs and oldid for one IP).

Finally, is this something that Anti-Harassment will be working on?

Finally, is this something that Anti-Harassment will be working on?

Yes - we're working on IP Masking now. Specifically, we're working on adding IP address information back in to the UI (for some trusted users) in some places where IP addresses have been replaced by temporary account names.

There are no plans currently for a dedicated SpecialPage for this information; instead, it will be shown inline in a few places (including history pages, log pages, recent changes, etc) and via IPInfo on Special:Contributions. (Special:CheckUser and Special:Investigate should work as before, with temporary accounts working the same as registered accounts. I've tested this a bit locally, and it seems to work out of the box so far.)

As such, I'm thinking a new API endpoint [...] that could return an IP associated with a edit or all IPs associated with a temporary user would be better. This would avoid mixing code that has requirements for different rights. The data could still be stored in the checkuser cu_changes table, and I think based on some reading this would require little changes to existing code.

This seems like a likely solution to me too. It seems safer to have a separate endpoint for data that is (1) available by a right that is granted more widely, (2) only available for temporary accounts and (3) only reveals the IP address (i.e. what would have previously been revealed via the user name).

Tchanders set the point value for this task to 8.Dec 8 2022, 12:31 AM

Thanks for the clarifications. If you need me to help review any code, I should be added to changes made to CheckUser and I can provide input.

Is there a plan for how CheckUser will interact with the code that shows IPs? If not I'm thinking that using a method that other extensions could make use of would be best for compatability purposes. For example, a hook that CheckUser would hook on to would allow other extensions to also provide this if CheckUser is not installed or the version of CheckUser is too old. However, that method could allow a situation where no data is provided which would add complexity to the API response.

Thanks for the clarifications. If you need me to help review any code, I should be added to changes made to CheckUser and I can provide input.

Thank you @Dreamy_Jazz !

Is there a plan for how CheckUser will interact with the code that shows IPs?

T325238: [Epic] IP Address Reveal for Privileged Users captures many of the examples that have been scoped out so far, of where we currently display IP addresses, but will in future display temporary account names, with an option for privileged users to reveal the IP address. (There will be a lot more that aren't captured here yet.)

IP addresses will be revealed when a user clicks a button, and stay revealed for 24 hours. For legal reasons, we won't display IP addresses unless the user has performed an action indicating they need to see them. (Access will also be logged.)

In general I expect that places where IPs can be revealed will make an API call to CheckUser's new endpoint (added in this task), prompted by a user interaction. So in general, I might expect:

  • A hook handler in CheckUser adds a "reveal IPs" button or something
  • On clicking that button, an API request is made to CheckUser for the IP addresses, given the temporary account name

...exact details depending on the product specs.

IPInfo works like this, using existing core hooks.

If not I'm thinking that using a method that other extensions could make use of would be best for compatability purposes. For example, a hook that CheckUser would hook on to would allow other extensions to also provide this if CheckUser is not installed or the version of CheckUser is too old. However, that method could allow a situation where no data is provided which would add complexity to the API response.

Are you thinking of third-party cases, where another extension other than CheckUser might be used for storing IP addresses?

If not I'm thinking that using a method that other extensions could make use of would be best for compatability purposes. For example, a hook that CheckUser would hook on to would allow other extensions to also provide this if CheckUser is not installed or the version of CheckUser is too old. However, that method could allow a situation where no data is provided which would add complexity to the API response.

Are you thinking of third-party cases, where another extension other than CheckUser might be used for storing IP addresses?

Yes. I guess that's up to you and the Anti-Harassment team as to whether that use case is worth supporting. My original assumption based on how these phabricator tasks are tagged with projects was that the code that adds the data to the output, implements an API and requests the info from CheckUser would be part of core or another extension. However, if that is not the case, then supporting other extensions providing this info will not be an issue as that extension would just use the same core hooks.

Change 874901 had a related patch set uploaded (by Tchanders; author: Tchanders):

[mediawiki/extensions/CheckUser@master] Add API endpoint for finding all IP addresses used by a temporary account

https://gerrit.wikimedia.org/r/874901

@Niharika You can see some of the errors that can be returned from the API, in this function: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/874901/1/src/Api/Rest/Handler/TemporaryAccountHandler.php#50

Do you have a particular preference for what the error messages should say, or should we put something similar to the IPInfo ones?

For edits cuc_this_oldid should store the associated oldid number. That could be used to look up the IP in the History page / diff page.

Hi, what is the best way to test this patch? Are temporary accounts available for us to create?

Thanks

Should be. I've found adding $wgAutoCreateTempUser['enabled'] = true; to LocalSettings.php enables and allows creating temporary accounts.

Should be. I've found adding $wgAutoCreateTempUser['enabled'] = true; to LocalSettings.php enables and allows creating temporary accounts.

+1. Then if you edit as a logged-out user, your edits will be assigned to a new temporary account, rather than your IP address.

@AGueyte I've added some more testing notes to the task description too.

Should be. I've found adding $wgAutoCreateTempUser['enabled'] = true; to LocalSettings.php enables and allows creating temporary accounts.

+1. Then if you edit as a logged-out user, your edits will be assigned to a new temporary account, rather than your IP address.

@AGueyte I've added some more testing notes to the task description too.

See also T326651: Add option to enable temp user creation on Patch Demo which would add temp user creation as an option to Patch Demo.

Change 874901 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Add API endpoint for finding IP addresses used by a temporary account

https://gerrit.wikimedia.org/r/874901

@Tchanders If my user does not have permission to see any temporary user's edits or if the temporary user is suppressed, should they be allowed to see their IP addresses?

@Tchanders If my user does not have permission to see any temporary user's edits or if the temporary user is suppressed, should they be allowed to see their IP addresses?

I'd imagine it's OK:

  • Presumably if the edit was suppressed, then surfacing the IP address would be no bad thing
  • I wouldn't expect temporary usernames themselves to become suppressed, since they are auto-generated so unlikely to be offensive

@Niharika what do you think?

@Tchanders a couple of other observations:

Non-sequential indices

The indices in the JSON response are not always sequential. E.g.:

{
  "ips": {
    "0": "192:168:1:1:0:0:0:0",
    "2": "192:168:1:0:0:0:0:0",
    "4": "192:168:0:1:0:0:0:0"
    ...
  }
}

I think because there are multiple entries in cu_changes with the same IP. In the above case, each edit also triggered an AbuseFilter rule.

Is this going to be awkward for frontend developers using the API?

Number of IPs returned sometimes less than limit parameter

Multiple cu_changes rows for an IP can also mean that the limit parameter behaves unexpectedly. With the same user above, limit=2 returns:

{
  "ips": [
    "192:168:1:1:0:0:0:0"
  ]
}

and limit=3 or 4 returns:

{
  "ips": {
    "0": "192:168:1:1:0:0:0:0",
    "2": "192:168:1:0:0:0:0:0"
  }
}
Negative limits

limit=-1 (or any negative number) allows a user to bypass $wgCheckUserMaximumRowCount in SQLite and leads to an exception in MySQL (You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '-1' at line 1). I guess because negative numbers are valid integers and always less than $wgCheckUserMaximumRowCount, so they pass our validation check in TemporaryAccountHandler.

Sorting of IPs

We sort by cuc_ip. This leads to IPv4 and IPv6 addresses being mixed together and an address like A10:33C0:0:0:0:0:0:0 coming before an address like A093:157C:0:0:0:0:0:0. Should we instead sort by cuc_ip_hex like we do with Special:Investigate?

I am wondering, for example, what will make it easier for an admin to be able to recognise IPs which have a common range (although I think sorting by cuc_ip will still allow them to do that).

Or, are we expecting anything calling the API to handle normalising and sorting the IPs themselves?

@Tchanders a couple of other observations:

Non-sequential indices

The indices in the JSON response are not always sequential. E.g.:

{
  "ips": {
    "0": "192:168:1:1:0:0:0:0",
    "2": "192:168:1:0:0:0:0:0",
    "4": "192:168:0:1:0:0:0:0"
    ...
  }
}

What was the API query that got these results? The numerical indices should only be present to represent revision IDs, not order, so we wouldn't expect them to be sequential. When fetching a list of IPs without the revision IDs, they should be in an (un-keyed) array, like your second example.

Number of IPs returned sometimes less than limit parameter

Multiple cu_changes rows for an IP can also mean that the limit parameter behaves unexpectedly. With the same user above, limit=2 returns:

{
  "ips": [
    "192:168:1:1:0:0:0:0"
  ]
}

and limit=3 or 4 returns:

{
  "ips": {
    "0": "192:168:1:1:0:0:0:0",
    "2": "192:168:1:0:0:0:0:0"
  }
}

I think this was broken by an update to make this query work with Postgres: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/883851

I'll push a patch to fix this.

Negative limits

limit=-1 (or any negative number) allows a user to bypass $wgCheckUserMaximumRowCount in SQLite and leads to an exception in MySQL (You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '-1' at line 1). I guess because negative numbers are valid integers and always less than $wgCheckUserMaximumRowCount, so they pass our validation check in TemporaryAccountHandler.

Good find - I'll push a patch for this too.

Sorting of IPs

We sort by cuc_ip. This leads to IPv4 and IPv6 addresses being mixed together and an address like A10:33C0:0:0:0:0:0:0 coming before an address like A093:157C:0:0:0:0:0:0. Should we instead sort by cuc_ip_hex like we do with Special:Investigate?

I am wondering, for example, what will make it easier for an admin to be able to recognise IPs which have a common range (although I think sorting by cuc_ip will still allow them to do that).

Or, are we expecting anything calling the API to handle normalising and sorting the IPs themselves?

Thinking more about this, we should really be sorting by timestamp, since we want the most recent IP addresses wherever we hit a limit. If the actor has very many rows, this could lead to a slow query (there's no index on cuc_actor, cuc_timestamp, so a sort will probably have to be performed). I suppose we can hope that in 90 days an actor won't accumulate that many rows, and we can also monitor the performance of this query. I'll upload a patch to fix this too!

@Tchanders a couple of other observations:

Non-sequential indices

The indices in the JSON response are not always sequential. E.g.:

{
  "ips": {
    "0": "192:168:1:1:0:0:0:0",
    "2": "192:168:1:0:0:0:0:0",
    "4": "192:168:0:1:0:0:0:0"
    ...
  }
}

What was the API query that got these results? The numerical indices should only be present to represent revision IDs, not order, so we wouldn't expect them to be sequential. When fetching a list of IPs without the revision IDs, they should be in an (un-keyed) array, like your second example.

It was http://localhost:8080/w/rest.php/checkuser/v0/temporaryaccount/*Unregistered%2017. My second example only seems to happen when only one IP is returned.

I've made separate tasks for the issues above, to stop this task getting any larger:

@Tchanders a couple of other observations:

Non-sequential indices

Filed as T328896: Fix non-sequential indices in TemporaryAccountHandler API results

Number of IPs returned sometimes less than limit parameter

I'll push a patch to fix this.

Filed instead as: T328894: Fix limit handling in TemporaryAccountHandler query

Negative limits

Good find - I'll push a patch for this too.

Also filed instead as: T328894: Fix limit handling in TemporaryAccountHandler query

Sorting of IPs

Thinking more about this, we should really be sorting by timestamp [...]

Filed as T328893: Sort results from TemporaryAccountHandler by timestamp

[...] and we can also monitor the performance of this query. I'll upload a patch to fix this too!

Filed as T328892: Monitor the performance of TemporaryAccountHandler APIs

Completeness of data

I checked that all IPs for a user were being returned from the cu_changes table (modulo limits).

Permissions and validation

I tried various combinations of variables relating to permissions (e.g. rights, blocks, revision visibility) and validation (e.g. non-existent username, type of user).

I could only see IP information if my user had the checkuser-temporary-account, had enabled the preference (T326736), was not blocked and was looking up a temporary user.

The status of the temporary user (i.e. suppressed) and their revisions did not seem to make a difference. See T324603#8569956 and my final comment below.

See full list of conditions tested.

Information disclosure

I considered possible information disclosure scenarios. But, I don't think any are very compelling.

Usability

I considered how easy this API would be for a frontend developer to use. I raised issues in T324603#8577985.

I've made separate tasks for the issues above, to stop this task getting any larger:

Thank you. Further testing will be done as part of those tasks, including testing in different SQL engines (including postgres if I can work out how).

@Tchanders If my user does not have permission to see any temporary user's edits or if the temporary user is suppressed, should they be allowed to see their IP addresses?

I'd imagine it's OK:

  • Presumably if the edit was suppressed, then surfacing the IP address would be no bad thing
  • I wouldn't expect temporary usernames themselves to become suppressed, since they are auto-generated so unlikely to be offensive

I guess we should make sure the decision we make here is consistent with whatever the permissions are for T327437 (I am not familiar with this).

Closing, since we are discussing temporary account suppression as part of T326759: Investigate: Which WMF deployed code might be affected by IP Masking?