Problem
The existing method of attributing edits from anonymous users to their current IP address seems inadequate, because their IP addresses change regularly for various reasons:
- IPv6 users regularly change IP addresses due to SLAAC (even when their location does not change).
- Mobile users regularly change IP addresses when moving closer to another cell tower.
- Users regularly change IP addresses when switching between networks (cellular to WiFi and between WiFi, e.g. home WiFi, cellular, train WiFi, office WiFi).
Solution
Attribute anonymous edits to a session ID instead of the current IP address. The session ID would be publicly associated with the first IP address used during that session.
Benefits
- Solves T152462: Add cookie when blocking anonymous users.
- Reduces talk page fragmentation for IP users.
- Improves other aspects of IP-blocking.
- Makes it easy to solve T12957: Allow logged in user to reclaim previous anon edits.
Original task description by @tstarling
In T171382 it was asserted that some IPv6 users regularly change IP addresses within a /64 block, due to SLAAC (RFC 4862). As such, the existing method of attributing edits to anonymous users seems inadequate.
I did some queries on recent anonymous IPv6 edits in the enwiki recentchanges table. My impression is that this does indeed happen, but the problem is worse than described: some IPv6 users use a mobile connection, and in fact routinely move around a block much larger than /64.
I've long dreamed of attributing anonymous edits to a session ID instead of an IP address, since this would fix T20981: Allow anonymising of unregistered users ("IP editors") and T12957: Allow logged in user to reclaim previous anon edits, but due to abuse control considerations, it seems unlikely that this will win community support. This proposal is a compromise, fixing only one of those two bugs, by attributing edits to a session ID which is publicly associated with the first IP address used during that session.
I mean the term "session" loosely, this might be an ID associated with a long-lived cookie.
The proposal in detail:
- On page save, if there is no existing session:
- Create the session, and store the current IP address in the session
- Search the actor table (T167246) for this IP address, and add a suffix to the IP address so as to make a unique username.
- Create the actor row. actor_text would be the suffixed IP address and actor_user would be NULL.
- On account creation, attributing the existing edits in the same session to the newly created account could be as simple as updating actor_user and actor_text in the existing actor row.
- Blocks would be applied to the session via its public identifier (the suffixed IP), solving T152462: Add cookie when blocking anonymous users.
- When an anonymous session is blocked, an autoblock would be applied to the last IP address actually used by the anonymous user in question, exactly analogous to the way logged-in users are blocked.
As an alternative, suffixing of the IP address could be omitted. In that case, to be feasible, I think you would have to have a single actor row per IP address, so you would not be able to solve T152462 or T12957. But at least you could have fewer user talk pages for anons who regularly migrate to a different IP address.
This was discussed on IRC, the log is at https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-08-02-21.05.log.html