Page MenuHomePhabricator

Better handling of revision deletion on import
Closed, ResolvedPublic

Description

Currently our Import process doesn't handle revision deletion at all, even though the relevant data is exported into the XML format.

This has the following effects for revisions:

Before exportAfter import
Username hiddenThe revision is skipped completely
Edit summary hiddenBlank edit summary
Content hiddenBlank revision (no content)

To me, the bottom two behaviours seem acceptable. We shouldn't skip revisions for which the username is hidden, though. The suppressed username should be replaced by user ID 0 and a user_text such as "Unknown user".


For log entries, any kind of selective deletion seems to just blow up WikiImporter; see T34876.

Event Timeline

TTO raised the priority of this task from to Low.
TTO updated the task description. (Show Details)
TTO subscribed.

Change 259987 had a related patch set uploaded (by Georggi199):
Made possible to import dumps without usernames

https://gerrit.wikimedia.org/r/259987

Change 259987 merged by jenkins-bot:
Handle missing titles and usernames when importing log items

https://gerrit.wikimedia.org/r/259987

Change 260349 had a related patch set uploaded (by Georggi199):
Fixed contributor and text handling in Import.php

https://gerrit.wikimedia.org/r/260349

Change 260349 merged by jenkins-bot:
Import: Properly handle deleted usernames in XML dumps

https://gerrit.wikimedia.org/r/260349

Note user:Unknown_user already existed on EnWiki (and only on EnWiki). It was created and briefly used for a few days in 2004. It has 126 edits. Any username should always be checked for existence before using it for any system purpose.

In 2017 this username was added to $wgReservedUsernames in T158474. The account exists, but it can no longer be logged into.