Page MenuHomePhabricator

Create a "delete me" maintenance script for special user/data deletion requests
Open, MediumPublic

Description

As discussed within the Security-Team Slack channel, it would be useful to have a comprehensive maintenance script that could delete as much MediaWiki-related data for a given user as possible. The obvious use-case for such a script would be certain RTBF-related requests, but there are likely other, similar use-cases beyond this. Some good starting points for such a script would likely be existing maintenance scripts like:

  1. ChangePassword.php
  2. InvalidateUserSessions.php
  3. NukePage.php
  4. ResetUserEmail.php
  5. ResetUserTokens.php
  6. UserOptions.php
  7. Various "Delete" maintenance scripts, such as DeleteArchivedRevisions.php, DeleteLocalPasswords.php, etc.

I would also note that this need not be comprehensive upon the first iteration and will likely need to change over time.

Event Timeline

Note: Miraheze has the RemovePII mediawiki extension for compliance with GDPR. There might be bits you can steal.

Note: Miraheze has the RemovePII mediawiki extension for compliance with RTBF. There might be bits you can steal.

Interesting - https://github.com/miraheze/RemovePII does indeed look like it could maybe get us part of the way there.

Reedy triaged this task as Medium priority.Sep 2 2021, 4:19 PM
Reedy moved this task from Incoming to Back Orders on the Security-Team board.

Note: Miraheze has the RemovePII mediawiki extension for compliance with RTBF. There might be bits you can steal.

Interesting - https://github.com/miraheze/RemovePII does indeed look like it could maybe get us part of the way there.

I agree that the Miraheze's extension covers an interesting range of PII to remove— including recentchanges, data collected during CheckUser lookups, and various logs. Meanwhile, ResetUserEmail doesn't allow you to delete user emails since that script, unless I am wrong, does not support empty email. So perhaps a the first iteration could remove email address. Then, next iterations would target more PII gradually. I guess I'm suggesting to start with removing email addresses because that request pops up frequently.

So perhaps a the first iteration could remove email address. Then, next iterations would target more PII gradually. I guess I'm suggesting to start with removing email addresses because that request pops up frequently.

Starting with various user email search/removal functionality sounds good to me. We're definitely not going to be able to catalog and qualify every piece of relevant data within a first or second iteration of this maintenance script, but getting an initial version up for code review would be a great first step. Let me know if I can help with that.

I've talked about this with individual SRE and security team members but let me say it on record as well that this type of script would be immensely helpful for T&S's workflow :) I'm sure as a former member of the T&S, Samuel would also concur with me :)

Change 720819 had a related patch set uploaded (by Samuel (WMF); author: Samuel (WMF)):

[mediawiki/core@master] Add script to remove user email address

https://gerrit.wikimedia.org/r/720819

Change 720819 merged by jenkins-bot:

[mediawiki/core@master] Add script to delete a user email's address

https://gerrit.wikimedia.org/r/720819

Update: A first iteration of this landed in https://gerrit.wikimedia.org/r/720819 (thanks, @sguebo_WMF!) It's a fairly simple maint script for now, but can and should be expanded upon down the road, either via adding additional features or creating additional maint scripts. I propose leaving this task open for now to track that future work.