Page MenuHomePhabricator

Examine username maximum length limit in MediaWiki and Wikimedia
Open, Needs TriagePublic

Description

[Trying to collate all relevant info, partially for future reference, but also for a discussion about username length...
Tentatively/contemplating proposals to:
a) lower the MediaWiki default - it's very large, and would make history pages (etc) even harder to read
b) re-examine the Wikimedia default - perhaps to reduce back down to 64, or override new-account creation to 64 limit if possible.]


Current and past configuration

Historically...


Examples

For reference, here are some existing example usernames, and their bytecount, for actual editors (please add a few more!)

ascii:

  • 32: Chase me ladies, I'm the Cavalry, (and a few others at 32, in enwiki's top 10,000, excluding vanished-renames)

non-ascii (found via):

  • 43: Хмельницкий Константин
  • 46: نبيل عبدالقادر عبدالوهاب
  • 50: फ़ाराह् देसाईं खान
  • 52: ศรีกฤษณะ รามจันทรา
  • 57: ديفيد عادل وهبة خليل (حساب ملغي)

quarry searches for users with >100 edits and >40 byte usernames, at a few wikis:

ascii (hypothetical):

  • 64: Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean
  • 85: Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget.
  • 255: Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis,

Old/related tasks:

Event Timeline

Quiddity raised the priority of this task from to Needs Triage.
Quiddity updated the task description. (Show Details)
Quiddity added a subscriber: Quiddity.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptDec 16 2015, 12:02 AM

Also watch out for other database tables that store a username in them, and whether they've had schema changes to bump up to 255. IIRC when we did the 64->85 bump during SUL in a hurry, some tables hadn't had the schema changes to make the column bigger and that caused some issues with usernames getting truncated.

Quiddity updated the task description. (Show Details)Dec 16 2015, 12:09 AM
Quiddity set Security to None.
Legoktm updated the task description. (Show Details)Dec 16 2015, 12:10 AM
Quiddity updated the task description. (Show Details)EditedDec 16 2015, 9:58 PM
Quiddity added a subscriber: MZMcBride.

I've added links to a few quarry searches, in the description. (Thanks to @MZMcBride for the SQL. :)

So, the reason for this proposal is to save space, particularly on history pages (but maybe also on Echo flyout, etc.)?

I'm concerned this might have a negative effect on non-Latin usernames, particularly CJK.

I did a quick check with Chinese and Japanese, which doesn't show much of a problem (though there are a few clashing usernames at lower edit counts). We would have to do more investigation.

Also, if we really care about visual length, ideally we would measure that instead. I wonder if there is a library for that somewhere.

http://quarry.wmflabs.org/query/6465 - zhwiki: 100+ edits
http://quarry.wmflabs.org/query/6466 - jawiki: 100+ edits
http://quarry.wmflabs.org/query/6467 - zhwiki: 5+ edits
http://quarry.wmflabs.org/query/6468 - jawiki: 5+ edits

For these two Wikipedias, there are only problems at the 5+ level.

Quiddity added a comment.EditedDec 17 2015, 12:10 AM
In T121604#1886375, @Mattflaschen wrote:

So, the reason for this proposal is to save space, particularly on history pages (but maybe also on Echo flyout, etc.)?

Exactly.
And also, to prevent people trolling merely by using a very long username. (1) it's easier to get more ambiguity (plausibly-deniable offensiveness) into a very long username. (2) very long usernames are frustrating to type, particularly for our blind editors. (3) other more BEANS ish reasons.

I did a quick check with Chinese and Japanese [...]

Thanks. I meant to investigate which languages had the highest byte density, or other relevant factors. IIRC zh/ja/ko covers that aspect?

I notice that around half of the top 10 longest names in jawiki are blocked accounts. Perhaps if you know how, you could add the SQL (that will add a column for blocked-status) to your quarry tasks, and then I'll update mine?

In T121604#1886375, @Mattflaschen wrote:

So, the reason for this proposal is to save space, particularly on history pages (but maybe also on Echo flyout, etc.)?

Exactly.
And also, to prevent people trolling merely by using a very long username. (1) it's easier to get more ambiguity (plausibly-deniable offensiveness) into a very long username. (2) very long usernames are frustrating to type, particularly for our blind editors. (3) other more BEANS ish reasons.

I did a quick check with Chinese and Japanese [...]

Thanks. I meant to investigate which languages had the highest byte density, or other relevant factors. IIRC zh/ja/ko covers that aspect?

Yes, that was the vague instinct that led me to check zhwiki and jawiki. However:

a. I don't actually know if CJK is the highest byte/char or highest byte/"normal name".
b. I didn't check kowiki.
c. We didn't check any of the other wikis in these languages.

So if we wanted to actually continue investigating feasibility, this would require talking to the Language team and running the script on all wikis.

As of now, I don't think it's a priority, though.

whym added a subscriber: whym.EditedFeb 5 2016, 1:14 PM

I notice that around half of the top 10 longest names in jawiki are blocked accounts. Perhaps if you know how, you could add the SQL (that will add a column for blocked-status) to your quarry tasks, and then I'll update mine?

See http://quarry.wmflabs.org/query/7207.

Also, as an admin on Japanese Wiktionary, I can confirm that those very long usernames tend to be abused by LTAs who use it to make an offensiveness statement.

EDIT: it looks like many, if not most, of the blocked accounts are not shown as blocked because admins have hidden them due to the offensiveness of the username. We'd need someone with advanced database access to collect more accurate statistics.

kaldari added a comment.EditedFeb 20 2016, 12:42 AM

From looking at the English Wikipedia discussion, it appears that they removed the 40 character limit with the understanding that usernames would be limited to 64 characters (which is no longer the case).

For MediaWiki in general, I think that a 64 byte default should be more than sufficient, even for Chinese and Japanese wikis.

What MediaWiki looks like with a crazy long username (as currently allowed):

FWIW, I think for MediaWiki in general, having a limit of max 64 unicode code points (but a byte limit of 255) makes the most sense.

unicode code points don't exactly correspond to visual length, but i think its close enough for this purpose, and its easy to measure.

For Wikimedia wikis, I've proposed a global 50 character limit on new usernames at https://meta.wikimedia.org/wiki/Talk:Title_blacklist#Excessively_long_usernames.