Page MenuHomePhabricator

RFC: Attribute anonymous contributions to the first IP address used in a session
Closed, DuplicatePublic

Description

Problem

The existing method of attributing edits from anonymous users to their current IP address seems inadequate, because their IP addresses change regularly for various reasons:

  • IPv6 users regularly change IP addresses due to SLAAC (even when their location does not change).
  • Mobile users regularly change IP addresses when moving closer to another cell tower.
  • Users regularly change IP addresses when switching between networks (cellular to WiFi and between WiFi, e.g. home WiFi, cellular, train WiFi, office WiFi).

Solution

Attribute anonymous edits to a session ID instead of the current IP address. The session ID would be publicly associated with the first IP address used during that session.

Benefits


Original task description by @tstarling

In T171382 it was asserted that some IPv6 users regularly change IP addresses within a /64 block, due to SLAAC (RFC 4862). As such, the existing method of attributing edits to anonymous users seems inadequate.

I did some queries on recent anonymous IPv6 edits in the enwiki recentchanges table. My impression is that this does indeed happen, but the problem is worse than described: some IPv6 users use a mobile connection, and in fact routinely move around a block much larger than /64.

I've long dreamed of attributing anonymous edits to a session ID instead of an IP address, since this would fix T20981: Allow anonymising of unregistered users ("IP editors") and T12957: Allow logged in user to reclaim previous anon edits, but due to abuse control considerations, it seems unlikely that this will win community support. This proposal is a compromise, fixing only one of those two bugs, by attributing edits to a session ID which is publicly associated with the first IP address used during that session.

I mean the term "session" loosely, this might be an ID associated with a long-lived cookie.

The proposal in detail:

  • On page save, if there is no existing session:
    • Create the session, and store the current IP address in the session
    • Search the actor table (T167246) for this IP address, and add a suffix to the IP address so as to make a unique username.
    • Create the actor row. actor_text would be the suffixed IP address and actor_user would be NULL.
  • On account creation, attributing the existing edits in the same session to the newly created account could be as simple as updating actor_user and actor_text in the existing actor row.
  • Blocks would be applied to the session via its public identifier (the suffixed IP), solving T152462: Add cookie when blocking anonymous users.
  • When an anonymous session is blocked, an autoblock would be applied to the last IP address actually used by the anonymous user in question, exactly analogous to the way logged-in users are blocked.

As an alternative, suffixing of the IP address could be omitted. In that case, to be feasible, I think you would have to have a single actor row per IP address, so you would not be able to solve T152462 or T12957. But at least you could have fewer user talk pages for anons who regularly migrate to a different IP address.

This was discussed on IRC, the log is at https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-08-02-21.05.log.html

Event Timeline

Is there any benefit in using a prefixed IP as the username, as opposed to using a session ID (possibly something easier to remember, such as a diceware string) and exposing the IP address separately? Then, a wiki could be configured so that "anonymous" edits are truly anonymous (unlikely to be interesting for Wikimedia projects but might be useful for others, e.g. wikis operating in jurisdictions with stronger privacy laws), and it would be possible to apply judgement in edge cases (e.g. hide IP addresses for edits originating from oppressive regimes).

Is there any benefit in using a prefixed IP as the username, as opposed to using a session ID (possibly something easier to remember, such as a diceware string) and exposing the IP address separately?

Mostly development time, I think, since you would have to develop a UI to expose the IP address and make it searchable. Anti-vandalism tools would have to be updated.

Note that you at least have to have a concept of a public identifier, distinct from the session ID. The session ID is traditionally private, knowledge of it is sufficient to act as that user.

My main concern is that the IP no longer matching the IP will be surprising to folks doing admin work, but they'll probably get used to it. :)

Secondarily, as an end user if I accidentally edit while not logged out, it's unclear whether the ip saved with will be my current IP address or one that I previously used, which might expose different information (home vs work or school location, for instance), making behavior less predictable, and making it harder to clear out if you want to hide an old ip. (While still not being that difficult to work around as an evil user by forcing new sessions after a cookie clear)

Overall though I'm in favor of a move away from IP raw usage,

Krinkle updated the task description. (Show Details)

The proposal addresses a tangible problem with a relatively straight-forward solution. Details still remain to be worked out, including the various open questions around the impact it would have on community tooling and privacy concerns that we need to address still.

Before we continue though, it'd be good to first have product/resourcing confirmed, e.g. which team would commit to this being on their roadmap?

On account creation, attributing the existing edits in the same session to the newly created account could be as simple as updating actor_user and actor_text in the existing actor row.

Note that wouldn't update signatures in wikitext, references in undo or rollback edit summaries, or other places where the IP-name is somehow used directly. That's not necessarily a blocker, but is something to be aware of.

tstarling lowered the priority of this task from Medium to Lowest.Jun 14 2018, 5:18 AM

For privacy reasons, there is substantial interest within the WMF for attributing anonymous edits to a session identifier, not to an IP address. I'm going to write up a separate task for that. This task as described, with attribution to the first IP address of the session, probably won't happen, at least not on WMF wikis.

Shall we remove the RfC tags from this task then?

This can probably just be closed as declined/superseded

Krinkle renamed this task from Attribute anonymous contributions to the first IP address used in a session to RFC: Attribute anonymous contributions to the first IP address used in a session.Sep 16 2020, 7:20 PM