Page MenuHomePhabricator

Log access to IP addresses of temporary accounts
Closed, ResolvedPublic

Description

Background

With IP Masking enabled, only privileged users will be able to see IP addresses (see T325238: [Epic] IP Address Reveal for Privileged Users for more information).

Whenever this information is accessed, it should be logged. This is in order to provide a pathway for peer oversight and keep a check on misuse of IP address reveal.

What should we log

Log 1: Activation/deactivation of access

  • Who activated/deactivated access to IP addresses
  • When they received/revoked the access (timestamp)

Log 2: Log of actions taken

  • Temp username that was revealed
  • Performer for the reveal
  • Timestamp of this action
  • Page See T325658#8805061
Retention

Indefinite retention of these logs

Who can access the logs
  • This log is visible to staff (t&s), stewards, checkusers and ombuds
Notes
  • We don't want to include any filters on the log just yet. This may change if we hear a demonstrated need for them.
  • Some of this may change when we rollout based on feedback we get
  • The log is debounced for 24 hours, i.e. the same action is not logged more than once in 24 hours

Event Timeline

IMO Special:CheckUserLog should only show items related to Special:CheckUser, Special:Investigate and the CheckUser API. Integrating IP access logs for temporary accounts in Special:CheckUserLog has the following potential issues:

  • If this data could be provided by extensions other than the CheckUser extension (as suggested by me in T324603#8482306) then it would make logging to cu_log difficult as it may not be installed.
  • If checkusers should not have access to to the log, this adds complexity to a page that previously only required a single right to view.
  • Using userips would not work here as CheckUsers (who have checkuser-log) should continue to see log entries for checks on Special:CheckUser with 'Get IPs' which uses this checktype. A new checktype would be needed if CheckUsers should not have access.
  • cu_log allows storing data on ranges and searching for this data, which is not applicable for accessing IP addresses of temporary accounts. This makes the table more polymorphic as some columns would not be applicable.

IMO, even if CheckUser is the only extension that could provide the info, it would be much better to use the logging table in a similar manner to how suppression works (which limits viewing the log item to users with certain rights) over cu_log. One plan is that cu_log be merged into the logging table (T309999), so adding IP access would make the conversion harder if it ever happens. Plus, if accesses are deleted after 90 days, cu_log does not handle deletion of log entries yet. I'm not sure about the logging table, but if not then a new table is likely best.

Rough estimate based on early analysis. We don't expect it to be very accurate.

  • Special page (if needed) for logs - 13
  • New table (if needed) - 8
  • Expire logs after X days - 8
  • Add log lines - 8

Why do the log entries need to expire? No other logs have an expiration.

Also, why is PII being logged? I see no reason that that the revealed IP needs to be in the log. Using the current CU log format should be sufficient.

Tchanders changed the task status from Open to Stalled.Feb 28 2023, 6:26 PM

Stalled while we wait for answers to the questions in task descriptions and comments (e.g. awaiting outside legal counsel).

Why do the log entries need to expire? No other logs have an expiration.
Also, why is PII being logged? I see no reason that that the revealed IP needs to be in the log. Using the current CU log format should be sufficient.

Hi @JJMC89 - I want to give some context on why we want to do this logging before I answer your questions. I finally have answers after getting clarity from the Legal team about this. The purpose of this log is to enable oversight to ensure IP address reveal is not being misused. The oversight path may require one to look at the IP addresses that were looked at or may just require the name of the temporary user who was checked -- ultimately depends on the ones doing the oversight. Our hunch is that either or both of those data points may be useful in the oversight process. This is similar to why IP addresses are preserved in the checkuser log. Hopefully this answers why we want to keep the IPs in this log.

To answer your question about expiration - We need these logs to expire because they contain the IP addresses. We could scrub the IPs and keep the log entry but will that be helpful on its own?

To answer your question about expiration - We need these logs to expire because they contain the IP addresses. We could scrub the IPs and keep the log entry but will that be helpful on its own?

The CU logs contain IP addresses and they do not expire. Why should this log be any different?

To answer your question about expiration - We need these logs to expire because they contain the IP addresses. We could scrub the IPs and keep the log entry but will that be helpful on its own?

The CU logs contain IP addresses and they do not expire. Why should this log be any different?

I would say that the reasons to see a temporary accounts IP are in a larger group than running CheckUser. Having an IP associated with a username in the CheckUser log probably falls under keeping it to deal with abuse, but I'm not so sure about this for all temporary account IP reveals (such as blocking an IP directly for simple vandalism).

The button makes it very easy to reveal an IP of a temporary user, feasibily making it possible for so called "fat fingers" on mobile to reveal an IP and make a log entry with no intention. CheckUser logs having an IP associated with an account is more of a multi-step process that is unlikely to be unintentional.

However, I'm also not so sure about expiring logs. Perhaps this could just be limited to the IPs being removed. Considering that there is no reason field to reveal it would make manually removing IPs that don't fall under the keeping to prevent abuse category difficult.

To answer your question about expiration - We need these logs to expire because they contain the IP addresses. We could scrub the IPs and keep the log entry but will that be helpful on its own?

The CU logs contain IP addresses and they do not expire. Why should this log be any different?

Few reasons:

  • This log will be a lot bigger than the CU log. Storing it perpetually will probably cause the log table to grow quite large quite quickly.
  • As temporary users and IP addresses change over time, older logs will likely be less and less useful as time goes by.
  • Running a CU check is more severe than revealing and IP address. Having longer term evidence for peer oversight in that case is way more important.
  • Last but not the least, we want to minimize how long we store IP addresses for as one of the goals of IP Masking. Storing IPs in the reveal log perpetually goes counter to that.
Niharika changed the task status from Stalled to Open.Mar 20 2023, 9:45 PM
Niharika updated the task description. (Show Details)

To answer your question about expiration - We need these logs to expire because they contain the IP addresses. We could scrub the IPs and keep the log entry but will that be helpful on its own

Few reasons:

  • This log will be a lot bigger than the CU log. Storing it perpetually will probably cause the log table to grow quite large quite quickly.

Unless the log entries are deleted, my understanding is that they could go into the logging table. Because this is so large already, I'm not sure that this would be a big impact.

  • As temporary users and IP addresses change over time, older logs will likely be less and less useful as time goes by.

In a large number of cases, yes. However, knowing the range a abusive temporary account user was on could help in linking future temporary accounts without the need for a checkuser to run a check.

As a checkuser on the English Wikipedia I have referenced log entries for IPs that are 2 to 3 years old. Last time I did this was yesterday. While not necessarily conclusive, it was fairly useful in making a link between users.

  • Last but not the least, we want to minimize how long we store IP addresses for as one of the goals of IP Masking. Storing IPs in the reveal log perpetually goes counter to that.

This is a fair point and something I don't disagree with. However, this log is not public and my understanding would be only for a very few people to see (which is better than the status quo of public IPs)

One thought I did have was that certian logs could be "saved" from deletion if there is a reasonable need to keep them around for more than 90 days. This could be a valid mid-point between no deletion and deletion after 90 days, as entries that are needed to stop abuse in the future could be kept for future reference.

However, for WMF use this could just be the checkuser wiki.

  • This log will be a lot bigger than the CU log. Storing it perpetually will probably cause the log table to grow quite large quite quickly.

This is the only one I think holds any weight and only if DBA say it is an issue.

  • As temporary users and IP addresses change over time, older logs will likely be less and less useful as time goes by.

Changing IP addresses is an issue for all users, not just temporary ones.

  • Running a CU check is more severe than revealing and IP address. Having longer term evidence for peer oversight in that case is way more important.

While running a CU check does give more information, in most cases IP addresses are the more identifying part.
Speaking as someone who does oversight of CUs. cutting of the logs at 90 days will likely be problematic for oversight of revealing IP addresses.

  • Last but not the least, we want to minimize how long we store IP addresses for as one of the goals of IP Masking. Storing IPs in the reveal log perpetually goes counter to that.

The same can be said for the CU log.

Changing IP addresses is an issue for all users, not just temporary ones.

True, but with editors like us, the CU has a fixed point, namely our permanent username. In this log, both the temporary username (probably will expire after 12 months) and the IP address (which changes every few months in most countries) will be changing. A log that says:

User:12345 at 123.45.67.89, two years ago

won't be useful for making a connection to

User:54321 at 98.76.54.32, today.

Even if you had today's IP address, given the size of the log, I'm not sure that you could realistically find any uses of that IP in previous years. The English Wikipedia alone currently gets edits from a quarter million unique IP addresses each month. When you combine all of the wikis, temporary users could be using something like a million IP addresses during 90 days. How many millions of revealed IP addresses do you think it would be practical to search through?

@JJMC89 and @Dreamy_Jazz these are valid points. Thank you. I am not too sure that we can persist this log "forever" since we are trying to trim down how much PII we store and for how long generally but we could potentially make it last longer than 90 days. I have brought this to the attention of my Legal partner and will report back on what I hear next week.

Thanks. To be clear, I would not oppose expiring logs if that's what legal says needs to be done. For WMF use, my concerns regarding keeping the entries for longer would be largely mitigated if the checkuser wiki could have the entry text copied over from the log if there was a valid need to keep them.


We don't want to include any filters on the log just yet. This may change if we hear a demonstrated need for them.

With respect to the log page, I think it would be important to allow filtering by the performer of the reveal.

My reasoning being that not having this would make it hard for those checking usage to find a pattern of mis-use. One reveal that is determined to be against policy (such as if they were in a dispute with the temporary user) may be seen as a mistake, but a pattern of bad reveals could be enough to remove the right from the user.

For example:

  • User X has the right to see temporary account IPs
  • Temporary account Y makes an edit that User X disagrees with
  • They proceed to revert each other a number of times
  • X reveals the IP of Y
  • User Z reviews the log, and finds this reveal questionable due to the "edit war" between X and Y.
  • Z wants to find previous reveals by X to see if any other reveals are also questionable.
  • Without filters (and no API endpoint to get log entries via a script), Z has to scroll through a large log to review X's other reveals.

Second example:

  • User X has temporary account IP rights
  • X is blocked for a reason that necessitates checking X's reveals (such as sockpuppetry or doxxing)
  • User Z who can see the log is tasked with checking the reveals by X
  • Z cannot easily do this without the ability to filter by the performer of the reveal, leading to a need to scroll through the full log or finding someone with database access to do this for them.

The difficult thing here is logging which IPs were unmasked:

  • Means we'd need the 90-day retention
  • For them the revealed IPs to be sortable/filterable, we'd need a custom log table

That in itself is a fair amount of upfront/maintenance work, and we're not sure yet whether users would expect the revealed IPs in the log, or want them more than keeping the logs for longer.

After discussing with @Niharika, we'll start by implementing logging without that requirement, and using the logging table. Depending on user feedback, we can design something more custom if needed.

Change 910091 had a related patch set uploaded (by Tchanders; author: Tchanders):

[mediawiki/extensions/CheckUser@master] Add logging infrastructure for logging temporary account IP address access

https://gerrit.wikimedia.org/r/910091

After discussion with @Niharika, we won't log the page for now, unless @MMoss_WMF requires it.

All subtasks resolved. Closing Epic. ✅