Handling of imported usernames
Closed, ResolvedPublic

Description

Handling of usernames in imported edits in MediaWiki has long been weird (T9240 was filed in 2006!).

If a local user does not exist with the same username as the author of an imported revision, we get a strange row in the revision table where rev_user_text is a valid but non-existent username and rev_user is 0. A rev_user value of 0 typically indicates an IP edit. Someone can later create a user with that username, but rev_user remains 0 for all existing revisions with that username. Depending on whether a tool looks at rev_user_text or rev_user, old revisions may or may not be considered to actually belong to the newly-created user.

If a local user with the same username as the author of an imported revision does exist when the import is done, the edit is attributed to that user regardless of whether it's actually the same person. See T179246 for an example where imported edits got attributed to the wrong account in pre-SUL times.

In Gerrit change 386625, @Anomie proposes to change that.

  • If revisions are imported using the "Upload XML data" method, a new mandatory field, which is intended to be interpreted as an interwiki prefix, must be populated to indicate the source of the edits.
  • If revisions are imported using the."Import from another wiki" method, the interwiki prefix of the specified source wiki will be used as the source.
  • During the import, any usernames that don't exist locally and cannot be auto-created via CentralAuth (T111605) or another authentication mechanism will be imported as an otherwise-invalid name. For example, an edit by a user with username Example from source 'en' would be imported as 'en>Example' [1] if a user with username Example does not exist and cannot be auto-created on the local wiki.
  • There will be a checkbox on Special:Import to specify whether the same should be done for usernames that do exist locally (or can be auto-created) or whether those edits should be attributed to the existing/auto-created local user.
  • On history pages, log pages, and the like, these usernames will be displayed as interwiki links, much as might be generated by wikitext like "[[:en:User:Example|en>Example]]". No parenthesized 'tool' links (talk, block, and so on) will be generated for these rows.
  • On WMF wikis, we'll run a maintenance script to clean up the existing rows with valid usernames and rev_user = 0. The current plan there is to attribute these edits to existing SUL users where possible and to prefix them with a generic prefix otherwise, but we could as easily prefix them all. Similar scripts could be written for non-WMF wikis.
  • Unfortunately it's impossible to retroactively determine the actual source of old imports automatically or to automatically do anything about imports that were misattributed to a different local user in pre-SUL times (e.g. T179246).
  • The same will be done for CentralAuth's global suppression blocks. In this case, on WMF wikis we can safely point them all at Meta.

Background: The upcoming actor table changes, T167246, require some change to the handling of these imported names because we can't have separate attribution to "Example as a non-registered user" and "Example as a registered user" with the new schema. The options we've identified are:

  1. This proposal, or something much like it.
  2. All the existing rows with rev_user = 0 would have to be attributed to the existing local user (if any), and in the future when a new user is created any existing edits attributed to that name will be automatically attributed to that new account.
  3. All the existing rows with rev_user = 0 and an existing local user would have to be re-attributed to different *valid* usernames, probably randomly-generated in some manner, and in the future when a new user is created any existing edits for that name would have to be similarly re-attributed.
  4. Like #2, except the creation (including SUL auto-creation) of the same-named account would not be allowed. Thus, an import before the local name exists would forever block that name from being used for an actual local account.
  5. Some less consistent combination of the "all the existing rows" and "when a new user is created" options from #2–4.

Of these options, this proposal seems like the best one.

[1]: ">" was chosen rather than the more typical ":" because the former is already invalid in all usernames (and page titles). While a colon is *now* disallowed in new usernames, existing names created before that restriction was added can continue to be used (and there are over 12000 such usernames in WMF's SUL) and we decided it'd be better not to suddenly break them.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 6 2017, 4:12 PM
daniel added a subscriber: daniel.Nov 11 2017, 1:16 PM

This RFC has been scheduled for public discussion on IRC on Wednesday, November 15.

The meeting is scheduled to take place 2 hours earlier than usual, at noon PST / 21:00 CET, at #wikimedia-office on Freenode.
@Anomie I hope this time works for you and you can attend the meeting.

Tgr added a subscriber: Tgr.Nov 15 2017, 11:57 PM
daniel edited projects, added TechCom-RFC (TechCom-Approved); removed TechCom-RFC.
daniel moved this task from TechCom-Approved to Last Call on the TechCom-RFC board.
daniel edited projects, added TechCom-RFC; removed TechCom-RFC (TechCom-Approved).
daniel added a comment.EditedNov 17 2017, 12:00 PM

Minutes of the IRC meeting on November 15: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-11-15-20.01.html

It was agreed for this RFC to enter the Last Call period. If no pertinent concerns remain unaddressed by November 29, this RFC will be approved for implementation.

Krinkle triaged this task as Normal priority.Nov 29 2017, 9:26 PM
Krinkle moved this task from Untriaged to In progress on the TechCom-RFC (TechCom-Approved) board.

This RFC has been approved for implementation as no objections have been raised during the last call period.

Xaosflux added a subscriber: Graham87.
Anomie closed this task as Resolved.Feb 11 2018, 2:25 PM
Anomie claimed this task.

This has been implemented, I think we can close it now.

I sometimes get inconsistent behavior with this when importing users that do not exist, particularly as it pertains to user contributions. This is both for importDump.php and Special:Import. I am using Mediawiki 1.31

At times it automatically puts the "prefix>user" before the username. In the revision list, the resulting username is just ordinary text and not a clickable link. No user contributions are given to the nonexistent user. rev_user is 0 in the database.

Other times it doesn't do that and I just get a clickable link to "user". This link is clickable, it's blue rather than red for a non-existent user, and I am brought to the user contributions page for this user. There is no interwiki prefix. rev_user is still 0 in the database.

How this work is somewhat mysterious and I haven't been able to figure out why. Usernames that start with a capital letter tend to go to "prefix>username", whereas usernames that are all lowercase letters become just the clickable "username". Occasionally there is a counterexample to that rule. It's pretty bizarre.

In Special:Import, if I click "Assign edits to local users where the named user exists locally", it makes no difference to this behavior. Whatever I type for my interwiki prefix, it is only randomly assigned to some users in the manner mentioned above. The same happens if I use importDump.php, but since I do not get to choose a prefix, the prefix it choses is "imported>username".

Is there some way to make this consistent? I am importing another wiki and expecting users to re-sign up afterward. I don't care whether they automatically get their old contributions back when they do, or whether they are all imported as a disconnected "imported>user" prefix, but I would at least like for it to be consistent. Can I change something in the XML to make this work?

I sometimes get inconsistent behavior with this when importing users that do not exist, particularly as it pertains to user contributions. This is both for importDump.php and Special:Import. I am using Mediawiki 1.31

At times it automatically puts the "prefix>user" before the username. In the revision list, the resulting username is just ordinary text and not a clickable link. No user contributions are given to the nonexistent user. rev_user is 0 in the database.

Other times it doesn't do that and I just get a clickable link to "user". This link is clickable, it's blue rather than red for a non-existent user, and I am brought to the user contributions page for this user. There is no interwiki prefix. rev_user is still 0 in the database.

How this work is somewhat mysterious and I haven't been able to figure out why. Usernames that start with a capital letter tend to go to "prefix>username", whereas usernames that are all lowercase letters become just the clickable "username". Occasionally there is a counterexample to that rule. It's pretty bizarre.

In Special:Import, if I click "Assign edits to local users where the named user exists locally", it makes no difference to this behavior. Whatever I type for my interwiki prefix, it is only randomly assigned to some users in the manner mentioned above. The same happens if I use importDump.php, but since I do not get to choose a prefix, the prefix it choses is "imported>username".

Is there some way to make this consistent? I am importing another wiki and expecting users to re-sign up afterward. I don't care whether they automatically get their old contributions back when they do, or whether they are all imported as a disconnected "imported>user" prefix, but I would at least like for it to be consistent. Can I change something in the XML to make this work?

Unless there's a better way, I'd say create dummy accounts for all the users before importing the edits. Usernames with all lower-case letters haven't been valid in MediaWiki for many years now, so the software would naturally get confused by them.