Page MenuHomePhabricator

Obfuscate old IP addresses in database
Open, Needs TriagePublic

Description

IP addresses of anonymous editors are recorded in the database forever; this is problematic for both legal and moral reasons, given that IP addresses are often personally identifiable information. In previous wikitech-l conversations about hiding IP addresses, it has been argued that it would make the job of recent changes patrollers and other vandal fighters unacceptably difficult. That kind of utility quickly decreases over time, though; the IP addresses of months-old edits are rarely useful.

A MediaWiki operator will probably store all kinds of personally identifiable information (web server logs etc), but will discard them after a certain period (90 days is typical). Similarly, MediaWiki could discard IPs (replace them with random tokens) or otherwise obfuscate them (e.g. discard last byte) after a certain timespan.


See also: T20981: Allow anonymising of unregistered users ("IP editors"), T95144: Avoid exposure of user IP addresses (tracking)

Event Timeline

Tgr created this task.Feb 25 2016, 7:03 AM
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptFeb 25 2016, 7:03 AM
Restricted Application added a subscriber: JEumerus. · View Herald TranscriptApr 18 2016, 5:50 PM
ZhouZ moved this task from Backlog to Assigned on the WMF-Legal board.Apr 18 2016, 7:01 PM
ZhouZ added a subscriber: APalmer_WMF.
Tgr updated the task description. (Show Details)Apr 21 2016, 11:01 PM
Nirmos added a subscriber: Nirmos.Sep 9 2017, 7:19 AM
Risker added a subscriber: Risker.Aug 4 2019, 4:37 AM
JFishback_WMF moved this task from Intake to Backlog on the Privacy board.
Alsee added a subscriber: Alsee.Mar 10 2020, 5:32 AM

I think you need to run this by legal. This appears to violate the copyright attribution requirements. Logged-out users submitted their contribution under an attribution license, with a reasonable expectation that their IP would serve as their attribution identity.

It also appears pretty pointless, given that the information would already have been distributed in the database downloads. The only thing actually accomplished by mangling existing histories would be to confuse editors.