Page MenuHomePhabricator

RFC: Create temporary accounts for anonymous editors
Closed, ResolvedPublic

Assigned To
None
Authored By
Tgr
Apr 23 2016, 3:00 PM
Referenced Files
None
Tokens
"Like" token, awarded by Vedmaka."Love" token, awarded by hashar."Love" token, awarded by awight."Heartbreak" token, awarded by tom29739."Mountain of Wealth" token, awarded by jayvdb.

Description

  • Affected components: MediaWiki core, CheckUser extension.
  • Engineer for initial implementation: TBD.
  • Code steward: TBD.

Motivation

The existing method of attributing edits from anonymous users to their current IP address seems inadequate. Because:

  1. Exposure of a user's IP address to the public is a privacy problem (e.g. prosecution by a repressive regime, public embarrassment, stalking and harrassment, revealing real-world identity and location; see also Exposure of user IP addresses.)
  2. Edits from the same anonymous session cannot reliably be found by other users, due to varying IP addresses. This makes makes it difficult to review content and deal with on-wiki abuse. ("User contributions", and user blocking).
  3. The user cannot easily find their own edits. ("My contributions").
  4. The user cannot reliably communicate to others, or be communicated with, or receive notifications ("the talk page problem").

IP addresses change regularly for various reasons:

  • IPv6 users regularly change IP addresses due to SLAAC (even when their location does not change).
  • Mobile users regularly change IP addresses when moving closer to another cell tower.
  • Users regularly change IP addresses when switching between networks (cellular to WiFi and between WiFi, e.g. home WiFi, cellular, train WiFi, office WiFi).

Also, when an IP editor is asked to register and does so, they get detached from their former contributions.

Requirements

(Specify the requirements that a proposal should meet.)

  • Edits by unregistered users are attributed to an identifier that is not based on personal information (such as IP address or Geo location).
  • Edits by unregistered users are attributed to an identifier that remains consistent within a browser session.

Exploration

Proposal

Attribute edits by unregistered users to a session ID instead of the current IP address.

Open questions:

  1. What will the session ID be based on?
    • The first IP address used during that session. (Rejected, per privacy reasons)
    • Auto-increment? UUID? Random? Random human-readable (e.g. diceware)?
  2. To what extend should these sessions act like real account?
  3. Should these be convertible to real accounts? If so, under what circumstances do we allow that, and how would that work?
  4. How can anti-abuse tools and workflows be adapted?

Benefits:

Prior art:

Related:


Original task description at T172477 by @tstarling

In T171382 it was asserted that some IPv6 users regularly change IP addresses within a /64 block, due to SLAAC (RFC 4862). As such, the existing method of attributing edits to anonymous users seems inadequate.

I did some queries on recent anonymous IPv6 edits in the enwiki recentchanges table. My impression is that this does indeed happen, but the problem is worse than described: some IPv6 users use a mobile connection, and in fact routinely move around a block much larger than /64.

I've long dreamed of attributing anonymous edits to a session ID instead of an IP address, since this would fix T20981: Allow anonymising of unregistered users ("IP editors") and T12957: Allow logged in user to reclaim previous anon edits, but due to abuse control considerations, it seems unlikely that this will win community support. This proposal is a compromise, fixing only one of those two bugs, by attributing edits to a session ID which is publicly associated with the first IP address used during that session.

I mean the term "session" loosely, this might be an ID associated with a long-lived cookie.

The proposal in detail:

  • On page save, if there is no existing session:
    • Create the session, and store the current IP address in the session
    • Search the actor table (T167246) for this IP address, and add a suffix to the IP address so as to make a unique username.
    • Create the actor row. actor_text would be the suffixed IP address and actor_user would be NULL.
  • On account creation, attributing the existing edits in the same session to the newly created account could be as simple as updating actor_user and actor_text in the existing actor row.
  • Blocks would be applied to the session via its public identifier (the suffixed IP), solving T152462: Add cookie when blocking anonymous users.
  • When an anonymous session is blocked, an autoblock would be applied to the last IP address actually used by the anonymous user in question, exactly analogous to the way logged-in users are blocked.

As an alternative, suffixing of the IP address could be omitted. In that case, to be feasible, I think you would have to have a single actor row per IP address, so you would not be able to solve T152462 or T12957. But at least you could have fewer user talk pages for anons who regularly migrate to a different IP address.

This was discussed on IRC, the log is at https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-08-02-21.05.log.html


Original task description at T133452 by @Tgr

For anonymous edits, MediaWiki makes available the IP addresses to everyone forever, which is a poor privacy practice, and can cause various problems to the user, from public embarrassment to outing to being prosecuted by a repressive regime. In the various discussions about this (see Exposure of user IP addresses for an overview) one option that came up was to automatically create temporary accounts for anonymous users and allow them to be converted to real accounts later. This task is for the discussion of the technical and social feasibility of that option.

See also:

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Krinkle added subscribers: Milimetric, Anomie, daniel and 9 others.

This task is effectively a superset of T172477. I've merged that into here, tagged as RFC, and incorporated part of its task description here (the problem statement).

Moving to "under discussion" on the RFC board, since this is well fleshed out, and has seen some discussion in the past.

I don't have any thoughts or opinion on this, just a question. Does this or T172477 have a dependency or interaction with T167246 ? E.g. anonymous editors would be handled as a separate type of actor, or something. Or is it a completely separate issue? Thanks.

Does this or T172477 have a dependency or interaction with T167246 ? E.g. anonymous editors would be handled as a separate type of actor, or something. Or is it a completely separate issue? Thanks.

Probably, yes. It was certainly on our minds when we talked about the actor work back in the day. Not sure how real-world implementation would turn out, though.

I think the more interesting question is when anonymous user accounts should be created. We cannot create them for visits which don't result in a page save attempt or similar, for obvious scaling reasons. If we create them on write (ie. when an actor ID needs to be inserted somewhere), the user will be detached from their contribution if the user agent does not persist the session (e.g. browsing with cookies disabled) without us being able to detect it beforehand and warn them. If we create them just before write (e.g. whenever a CSRF token is obtained, like the user opening the edit form), that means doing stateful work on GET.

Krinkle renamed this task from Create temporary accounts for anonymous editors to RFC: Create temporary accounts for anonymous editors.Apr 4 2020, 2:33 AM
Pppery added subscribers: Liuxinyu970226, Pppery.

Sorry, misclicked.

If we create them on write (ie. when an actor ID needs to be inserted somewhere), the user will be detached from their contribution if the user agent does not persist the session (e.g. browsing with cookies disabled) without us being able to detect it beforehand and warn them.

I don't imagine that a warning would have much effect anyway. Any user worried about getting detached from their contributions would presumably create an account.

Why does CheckUser need to be a highly restricted right? Anyone who edits logged-out by accident immediately has their IP address exposed to the public, which implies that we don't really value IP address privacy that highly. So why do we need to put up high barriers against giving access?

It gives you access to seasoned editor's IP addresses, too. Of course everyone deserves just as much privacy, but we could lose long-term prolific editors to outing. I certainly wouldn't mind more CUs as it is, but I'm not sure all are going to be OK with that. Currently the policy is quite strict, for better or worse.

What if we had a cloak flag which could be set on user accounts, hiding them from normal CheckUser results? It could be granted to users like a group, and given liberally to good-faith users. There could be a public request process on-wiki, and a private process, say in OTRS.

Then you would have a basic CheckUser right (checkuser-uncloaked) which would be given to admins. The traditional checkuser right would allow functionaries to determine the IP address of cloaked users.

The idea and terminology is inspired by Freenode.

Having to go through the CheckUser interface is going to slow down the workflow and make day to day counter-vandalism quite a pain. Every time a "temporary user" vandalizes, am I meant to run checks and see if it's an IP or range I should block? The underlying IPs would really need to exposed automatically, built right into Special:Contributions.

How about if we expose uncloaked IP addresses to people with checkuser-uncloaked via Special:Contributions and RC. Access to cloaked contributions would still require that you go to Special:CheckUser and enter a reason.

Every time a "temporary user" vandalizes, am I meant to run checks and see if it's an IP or range I should block? The underlying IPs would really need to exposed automatically, built right into Special:Contributions.

I support Tim's idea of a cloak. I think it would make a good transitional phase at the very least, as it allows us to decouple the problem of access to IP information from the problem of having something more stable and anonymous as the main and only representation of an IP editor. E.g. the new "temporary user" would solve a lot of problems with regards to unstable IPs (e.g. talk pages, contributions continuity over IP changes, possibility to upgrade a tempory user into a real user etc.). It would also solve the problem of IP data being forever public. The other problems regarding counter-vandalism and transparency etc could remain as today, since we would not limit access to the IP info very much at first.

Beyond the transitional phase though, I think we can do better in the long run, and that there really shouldn't be any need for counter-vandalism to involve IP addresses. I hope that in time when that is addressed, these can then be folded back into CheckUser essentially.

@MusikAnimal Would you agree that this need is no different for registered users? What is the difference between a newbie account with username today, and a "temporary user" we assign to an IP user in the possible future? We don't expose their IP to patrollers today, right?

Note, I don't deny the need you describe. I get it. (Also as being maintainer of RTRC, GUC, and CVNBot.) Rather, I think the focus on the IP information is a bit too short-term and is a distraction from the issue that really we just have very bad counter-vandalism tooling as soon as someone signs up. This is a problem today as well. I think it's worth focussing on that and exploring more the space of how we can empower patrollers and admins to do more with less. For example, we have autoblock today, which acts on IPs without admins needing to know the IP they are acting on.

Instead of a cloak flag what if it was just a new user right (e.g. hide-ips) that could be assigned to any user group, like extended confirmed users? That way each wiki could tailor it to their specific anti-vandalism needs and capacities.

Note, I don't deny the need you describe. I get it. (Also as being maintainer of RTRC, GUC, and CVNBot.) Rather, I think the focus on the IP information is a bit too short-term and is a distraction from the issue that really we just have very bad counter-vandalism tooling as soon as someone signs up. This is a problem today as well. I think it's worth focussing on that and exploring more the space of how we can empower patrollers and admins to do more with less. For example, we have autoblock today, which acts on IPs without admins needing to know the IP they are acting on.

It's true that the value reviewers get from the IP does not really come from staring at a bunch of numbers, and maybe we are placing too much emphasis on that. What they really want is the reverse DNS, whois, proxy checks, etc. They want to know more about the context of a contribution, to help them decide how to respond to it. Is it COI? Is it a child in school? Is it a known troll hopping around a mobile ISP? Informing admins who want to place range blocks is one aspect, but that's not a merely binary decision -- the admin needs to set the reason text and the block options, which may well depend on what sort of range it is.

Rather, I think the focus on the IP information is a bit too short-term and is a distraction from the issue that really we just have very bad counter-vandalism tooling as soon as someone signs up. This is a problem today as well. I think it's worth focussing on that and exploring more the space of how we can empower patrollers and admins to do more with less.

@Krinkle - That's exactly what the Anti-Harassment Tools team is working on. You can read more about it at https://meta.wikimedia.org/wiki/IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation#Tools and https://meta.wikimedia.org/wiki/IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation/Improving_tools.

If we use seamless signup like this, we'd be giving "anons" a user ID. That would mean that all filters that distinguish between anon edits and logged in edits would no longer function. We'd have to provide filters based on user group (or absense of user group) instead. And we'll have to ensure such filters don't create performance issues for heavy duty queries (e.g. recentchanges).

The cloak idea sounds like a solid compromise and a step in the right direction. If at least admins can freely see IPs of "unregistered" users, and use the same tools we have today, we'll probably be okay. I think for now the overall focus should be not to get rid of IPs, but to reduce their visibility and the need to see them.

It's true that the value reviewers get from the IP does not really come from staring at a bunch of numbers, and maybe we are placing too much emphasis on that. What they really want is the reverse DNS, whois, proxy checks, etc. They want to know more about the context of a contribution, to help them decide how to respond to it. Is it COI? Is it a child in school? Is it a known troll hopping around a mobile ISP? Informing admins who want to place range blocks is one aspect, but that's not a merely binary decision -- the admin needs to set the reason text and the block options, which may well depend on what sort of range it is.

Yes, precisely. It being an IP address is irrelevant, rather it's the information we get from the IP that we care about. Often it's the only means to establish a pattern of abuse, and blocking the range is the only means to stop it. That's why I kind of liked T227733: Draft: Masking IP addresses for increased privacy, as it seemingly wouldn't break our workflows, instead focusing on obfuscating the IPs.

Proxies are another major point – we really should be blocking those globally. That seems like something we could do fairly easily now (i.e. promote enwiki's proxy blocking bots to the steward level), and would get rid of one of the problems that we currently can only solve by looking at IPs.

Given there's a large active community discussion about this problem on wiki as Kaldari pointed to, it might be a good idea not to fork the discussion.

There's an aspect to the UI design which I'm calling "cowbell", by which I mean the highly visible ways in which we tag the contributions of certain users as needing extra review. The current system has two kinds of cowbell: having a name which is a bunch of numbers, and having a name which is a red link. Having a name which follows a pattern like "Anon 12345" is a kind of cowbell. If the pattern is localised by the content language, global sysops and other small wiki patrollers may have trouble identifying anonymous users. We could have extra CSS or icons to assist in understanding.

If something is useful as a cowbell, it makes sense for it to be usable as a filter in RC and watchlists. As Daniel points out, that has traditionally been the case with anonymous users.

You can't filter by whether the user page exists, which reflects the fact that the red link cowbell is the unloved result of a UI accident. Nobody wants to actually edit the user page of a user whose edits they are reviewing, which is supposedly the purpose of red links. Anonymous user links currently go to the contributions page, which is more useful. In the current proposal, if User::isAnon() is true, links would naturally go to the contributions page, as they do for UseMod imports. If User::isAnon() is false (the automatic user creation variant of the proposal), then user links in changes lists would naturally be red.

My point is that we should reconsider the styling of usernames in change lists as part of this work.

One potential solution we can borrow from Google docs is to assign random names to users. This would be trickier at our scale than on a document shared with a few dozen people, but could be possible. Of course, we allow actual users to have any name they want, so styling still comes into play.

Having two levels of Checkuser right, one for users who have created accounts and a lower-level one for those who haven't, seems sensible. This would be pretty similar to having cloaks.

Giving a cloak to a registered user some time after they register seems a bit useless, as one could record an IP associated with that registered account (we are trying to protect editors from state-level deanonymization attempts?). Keeping private records of what Checkusers see what certainly helps, but new editors get a lot of honest scrutiny.

We have decided to follow through on the plan outlined in this ticket as part of T262321: IP Masking. @Tgr should we close this ticket?

Yes, thanks for all your work on this!