Page MenuHomePhabricator

Decide the prefix character for temporary usernames
Closed, ResolvedPublic

Assigned To
Authored By
Niharika
Mar 22 2023, 5:41 PM
Referenced Files
F37370056: image.png
Aug 9 2023, 7:25 AM
F36290231: image.png
Aug 9 2023, 7:25 AM
F37330595: image.png
Aug 8 2023, 4:46 PM
F37001199: image.png
May 16 2023, 3:28 PM
F36988740: image.png
May 8 2023, 1:19 PM
F36922940: Screenshot 2023-03-22 at 17.48.39.png
Mar 22 2023, 5:48 PM
F36922942: Screenshot 2023-03-22 at 17.49.41.png
Mar 22 2023, 5:48 PM

Description

Problem

There is an old bug T14974 in MediaWiki that breaks templates, variables, parser functions etc that accept usernames for input which begin with characters that have a special predefined “start-line” formatting. This includes the character * which is why it is problematic for IPM since our choice of temp usernames is beginning them with an asterisk. For these characters, the software forces a newline when the character is encountered.
The exact way it breaks is as follows -
Say I have a template like this that inputs a username as a parameter:
[[User:{{ucfirst:*Unregistered 1}}]]
However the output will look like -

[[User:
* *Unregistered 1]]

Sandbox for demo.

Here’s an on-wiki discussion that includes some examples of the problems here.

Preferred solution:

We pick a different character that seems more viable. If we want to do this, we can look through our past analysis on the usage for different characters. I will note that there are others besides * that suffer from the same bug - # : {| ;. Also, = has been called out as a bad character choice because it acts as an assignment operator inside functions/templates. See full list below.

Previously considered alternate solution
We stick to the * and expect the bug to be fixed. The problem with this approach is that the bug has existed for so long that a number of templates/functions etc have started relying on the bug and working around it. Fixing the bug will break these. We will need to work with the communities to modify these. Even though this feels like the right approach, it will require a lot of effort and time to help the communities make the changes and also require an unknown amount of work from the parsing team to actually fix the bug.

Prefix options (in order of preference)
CharacterCountNotesCan it be used as a temp username prefix?
~1160Previously used in a suffix to indicate account renamed in SUL unification
-10387
^1409
!2715
\217
2XXX (year as prefix)
Ruled out prefix options
CharacterKnown issuesCan it be used as a temp username prefix?
%Special character in URLs, may cause compatibility issues with gadgetsNo
&Special character in URLs, may cause compatibility issues with gadgetsNo
(Would probably be very annoying when switching between LTR and RTL wikisNo
)Would probably be very annoying when switching between LTR and RTL wikisNo
*T14974No
+Special character in URLs, may cause compatibility issues with gadgetsNo
/Invalid in usernamesNo
:T14974; Currently invalid in usernames (InvalidUsernameCharacters)No
;T14974No
<Invalid in titlesNo
=Acts as an assignment operator within templates & functions; Currently invalid in usernames (InvalidUsernameCharacters)No
>Used as the default interwiki prefix when importing pages; Currently invalid in usernames (InvalidUsernameCharacters); Invalid in titlesNo
?Special character in URLs, may cause compatibility issues with gadgetsNo
@Used in the reply tool on talk pages to mention other users; Used as special syntax when setting user rights cross-wiki; Currently invalid in usernames (InvalidUsernameCharacters)No
{Invalid in titlesNo
pipeInvalid in titlesNo
}Invalid in titlesNo
— (M dash)Hard to typeNo
#T14974; Invalid in titlesNo
[Invalid in titlesNo
]Invalid in titlesNo
_Invalid in titles (underscores become spaces, leading spaces are removed)No
"Unbalanced quotes could throw off usersNo
$Could be interpreted as currency; May not be easily typed by users in different localesNo
'Unbalanced quotes could throw off usersNo
,Too smallNo
.Too smallNo
`Unbalanced quotes could throw off usersNo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@RHo The em dash might not throw any issues technically or visually, but I think people might have trouble typing on a standard keyboard. And they might need to do it on Talk pages, while making blocks or when filtering logs.

@RHo The em dash might not throw any issues technically or visually, but I think people might have trouble typing on a standard keyboard. And they might need to do it on Talk pages, while making blocks or when filtering logs.

Good point. I like some of the suggestions @Ladsgroup makes about characters on a standard Persian keyboard as a check. Particularly × as suggestive of negative/not-an-account.

FWIW these are characters that can be found in a standard Persian keyboard (as a way to check for universality of the choice): !٬٫﷼٪×،*)(ـ+÷؟|

I personally prefer ~ as it was used to note users before but I acknowledge it's hard to use in non-standard keyboards

Hi all, FYI that the Design Strategy team recently wrapped up initial usability testing of IP masking (desktop+mobile)—findings deck located here—and the * issue popped up in the user tests as well. A large proportion (up to half) of the 25 English-speaking testers interpreted the temp account *23-15.498 as representing their IP address (in part or in whole). For some testers, the * contributed to that interpretation by looking like it was being used to obscure part of their IP address.

Thanks for cleaning up the table @Niharika and the numbers @Milimetric :)

Looking the the available options I think the only ones that make sense to me are ~, ^ and -.

Having unbalanced quotes seems weird. The period, backtick and comma are too small. I've seen the dollar sign be used as a prefix for crypto at some places, but other than that the currency sign on the keyboard would be different per geographic location. I don't have an objective reason for not liking ! or \, just don't find them suitable.

I really like !

! is readily identifiable as a character, widely available on keyboards and I like this implication of NOT as in NOT a permanent account.

Some more thoughts:

  • ! is probably the easiest to type out of all of these (on mobile touch keyboards too). The "not" meaning might be obscure, but it definitely conveys the unusualness of these user names. On the other hand, it may be surprising to speakers of Spanish, where exclamations are usually written "¡like this!", so the non-inverted mark at the beginning of a word may be unexpected.
  • ~ will be slightly annoying to type in some languages, e.g. Polish, where it's used as a dead key in an alternative way of inserting characters with diacritics. No one ever uses this input method in practice, but it's enabled by default, so it's confusing for people when typing a tilde doesn't work. (Yes, this is also a problem with typing wikitext signatures.)
  • \ will definitely be confused with / by everyone.
  • - and ^ I dislike for no reason.
  • ? (already ruled out, but bear with me) was another option I liked, due to the ease of typing and the association with something unknown, just like we don't know who these users are. Alas, it needs to be encoded in URLs like /wiki/User:%3FFoo instead of /wiki/User:?Foo. I wouldn't rule it out just because of that, but it might be enough to prefer other options.

I don't think that - (the hyphen-minus) is a good choice for the first character, because the same character is used to exclude items from a search, which could result in finding everything except the user you were looking for.

Also, there are thousands of accounts named things like -Username, -Username-, or -=Username=-.

These three: ~ ^ \ are all hard to type on an iPhone (English), because you have to click through to the third keyboard before you can find them.

! and the year (20nn) seem like they would be the easiest to type.

Thanks for all the input folks! We'll be posting these options in the next community updates on Meta and are hoping to get some more feedback there.

  • ? (already ruled out, but bear with me) was another option I liked, due to the ease of typing and the association with something unknown, just like we don't know who these users are. Alas, it needs to be encoded in URLs like /wiki/User:%3FFoo instead of /wiki/User:?Foo. I wouldn't rule it out just because of that, but it might be enough to prefer other options.

I'm strongly opposed to this because of the URL encoding issue. I don't see a good reason for making the URLs less type-able, given that there are plenty of options that don't have this problem. If we decide to do this, I'd expect that decision to be continually questioned by users in the future.

  • ! is probably the easiest to type out of all of these (on mobile touch keyboards too). The "not" meaning might be obscure, but it definitely conveys the unusualness of these user names. On the other hand, it may be surprising to speakers of Spanish, where exclamations are usually written "¡like this!", so the non-inverted mark at the beginning of a word may be unexpected.

? has this issue in Spanish too. I'm not sure whether it is a big problem, but either way ? doesn't have the advantage over ! for this reason.

With ! being so easy to type, I think it's my favourite option.

Using the year as a prefix will actually make name lengths longer, compared to using a 1-character prefix - even over time.

Assuming we use prefix + decimal number as the name pattern, here's how user name length would scale if the rate of increase was constant (say 10 million new temporary users per year):

YearNew users this yearTotal usersMax name length (+1 char)Max name length (+2 char year)Max name length (+4 char year)
110,000,00010,000,0008+18+28+4
1010,000,000100,000,0009+18+28+4
10010,000,0001,000,000,00010+18+28+4

Even the 2-digit prefix doesn't help, and has other downsides: we'd have to disallow registered usernames starting with any two digits, and (thinking optimistically) we'd run out of prefixes in 100 years.

And a bit of an absurd example, assuming exponentially increasing numbers of new temporary users (10 million in the first year, 100 million in the second year, etc), to illustrate the point:

YearNew users this yearTotal usersMax name length (+1 char)Max name length (+2 char year)Max name length (+4 char year)
110,000,00010,000,0008+18+28+4
2100,000,000110,000,0009+19+29+4
..................
1010,000,000,000,000,00011,111,111,110,000,00017+117+217+4

Using the year as a prefix will actually make name lengths longer, compared to using a 1-character prefix - even over time.

Assuming we use prefix + decimal number as the name pattern, here's how user name length would scale if the rate of increase was constant (say 10 million new temporary users per year):

YearNew users this yearTotal usersMax name length (+1 char)Max name length (+2 char year)Max name length (+4 char year)
110,000,00010,000,0008+18+28+4
1010,000,000100,000,0009+18+28+4
10010,000,0001,000,000,00010+18+28+4

Even the 2-digit prefix doesn't help, and has other downsides: we'd have to disallow registered usernames starting with any two digits, and (thinking optimistically) we'd run out of prefixes in 100 years.

And a bit of an absurd example, assuming exponentially increasing numbers of new temporary users (10 million in the first year, 100 million in the second year, etc), to illustrate the point:

YearNew users this yearTotal usersMax name length (+1 char)Max name length (+2 char year)Max name length (+4 char year)
110,000,00010,000,0008+18+28+4
2100,000,000110,000,0009+19+29+4
..................
1010,000,000,000,000,00011,111,111,110,000,00017+117+217+4

Hi @Tchanders - my understanding was that the year prefix would be used so the incrementing number resets at the start of each year, so the Total users is inconsequential.
So for example in Year 2023, the ten millionth user would look like 2023-10000000 and 13 characters long, and then ten years later in 2023, the 100 millionth person who is actually the ten millionth temp account that year would still be 13 characters long 2023-10000000. Is this possible?

Hi @Tchanders - my understanding was that the year prefix would be used so the incrementing number resets at the start of each year, so the Total users is inconsequential.

I've included "Total users" because this affects "Max name length (+1 char)".

So for example in Year 2023, the ten millionth user would look like 2023-10000000 and 13 characters long, and then ten years later in 2023, the 100 millionth person who is actually the ten millionth temp account that year would still be 13 characters long 2023-10000000. Is this possible?

This is correct - see the first table. Compare "New users this year" to "Max name length (+4 char year)" - both are the same every year. Then compare "Total users" to "Max name length (+1 char)" - both increase. However, even after 100 years, the 1 char approach is still fewer characters.

(The second table imagines that the number of new users per year keeps increasing. "Max name length (+4 char year)" is still based on "New users this year".)

Hi @Tchanders - my understanding was that the year prefix would be used so the incrementing number resets at the start of each year, so the Total users is inconsequential.

I've included "Total users" because this affects "Max name length (+1 char)".

So for example in Year 2023, the ten millionth user would look like 2023-10000000 and 13 characters long, and then ten years later in 2023, the 100 millionth person who is actually the ten millionth temp account that year would still be 13 characters long 2023-10000000. Is this possible?

This is correct - see the first table. Compare "New users this year" to "Max name length (+4 char year)" - both are the same every year. Then compare "Total users" to "Max name length (+1 char)" - both increase. However, even after 100 years, the 1 char approach is still fewer characters.

(The second table imagines that the number of new users per year keeps increasing. "Max name length (+4 char year)" is still based on "New users this year".)

Gotcha, sorry I think I am mixing having the year as part of the name as a easier scannability factor and reducing the incrementing number to per year basis; with having a it as the prefix. I agree it would make sense to have a specific single special character like ~ or ! as the demarcating element if temp accounts, and the inclusion of YYYY as a separate issue.

  • ? (already ruled out, but bear with me) was another option I liked, due to the ease of typing and the association with something unknown, just like we don't know who these users are. Alas, it needs to be encoded in URLs like /wiki/User:%3FFoo instead of /wiki/User:?Foo. I wouldn't rule it out just because of that, but it might be enough to prefer other options.

I would avoid not because it gets encoded (like, for example space does), but because if you don't encode it, it breaks. We try to ensure people can copy page titles into the URL to easily navigate, and having ? will make that impossible. It will also make sharing links via chat clients tricky as they often notoriously buggy when handling URL encoding.

I don't think that raw length should be the primary consideration. As eight digits is a reasonably foreseeable number of accounts, we might want to consider formatting the usernames for readability:

User:!1234-5678
User:2023-1234-5678
`

The year prefix contains relevant meaning that might be useful for future editors ("Oh, that temporary account was three years ago. There's no point in trying to contact them...").

Also, if you can type the "1234" part of the username, you'll be able to type the "2023" part. It won't be auto-incorrected to ¡ if your device is trying to "help" you type in Spanish.

I don't think that raw length should be the primary consideration. As eight digits is a reasonably foreseeable number of accounts, we might want to consider formatting the usernames for readability:

User:!1234-5678
User:2023-1234-5678

The year prefix contains relevant meaning that might be useful for future editors ("Oh, that temporary account was three years ago. There's no point in trying to contact them...").

Also, if you can type the "1234" part of the username, you'll be able to type the "2023" part. It won't be auto-incorrected to ¡ if your device is trying to "help" you type in Spanish.

I think this is a good point. Readability seems like it would be important for many editors and anything over a group of 4 generally is a lot harder for humans to process, remember and recognise.

I don't think that raw length should be the primary consideration. As eight digits is a reasonably foreseeable number of accounts, we might want to consider formatting the usernames for readability:

User:!1234-5678
User:2023-1234-5678
`

The year prefix contains relevant meaning that might be useful for future editors ("Oh, that temporary account was three years ago. There's no point in trying to contact them...").

Also, if you can type the "1234" part of the username, you'll be able to type the "2023" part. It won't be auto-incorrected to ¡ if your device is trying to "help" you type in Spanish.

Relevant to this discussion is recent findings from Growth team user testing on how well unregistered editors understand temporary accounts, with questions posed asking their understanding of the name format and preferences. There was a bit of confusion in the initial *YY-##.### format used (previous * prefix, * two-digit YY, period thousand separator for the incrementing number). Many participants thought that it was their IP address being used, but partially obscured, it seems due to the asterisk and the dot separators.

When shown a few other formats and asked to rate their preference, it was clear that something with a lexical prefix was preferred - in the test *Unregistered-23.15498 or *Temp-23.15498 were clearly top choice.

image.png (850×1 px, 230 KB)

This initial set of findings with English participants is currently being run with folks in es, ja, and ar (see T328616) with amendments to the UI and username format being tested, which should provide more info about whether having a localisable/english word as part of the name could help as well for legibility.

The year prefix contains relevant meaning that might be useful for future editors ("Oh, that temporary account was three years ago. There's no point in trying to contact them...").

Evidence from the recent Spanish-language user tests (in which the sample temp account name is being shown as ~2023-15498) indicates that people do latch on to the 4-digit year as a source of meaning. Generally a larger proportion of testers are interpreting ~2023-15498 as randomly generated in some way, and a much smaller proportion are interpreting that it has anything to do with their personal IP address.

I don't think that raw length should be the primary consideration. As eight digits is a reasonably foreseeable number of accounts, we might want to consider formatting the usernames for readability:

User:!1234-5678
User:2023-1234-5678
`

The year prefix contains relevant meaning that might be useful for future editors ("Oh, that temporary account was three years ago. There's no point in trying to contact them...").

Also, if you can type the "1234" part of the username, you'll be able to type the "2023" part. It won't be auto-incorrected to ¡ if your device is trying to "help" you type in Spanish.

Relevant to this discussion is recent findings from Growth team user testing on how well unregistered editors understand temporary accounts, with questions posed asking their understanding of the name format and preferences. There was a bit of confusion in the initial *YY-##.### format used (previous * prefix, * two-digit YY, period thousand separator for the incrementing number). Many participants thought that it was their IP address being used, but partially obscured, it seems due to the asterisk and the dot separators.

When shown a few other formats and asked to rate their preference, it was clear that something with a lexical prefix was preferred - in the test *Unregistered-23.15498 or *Temp-23.15498 were clearly top choice.

image.png (850×1 px, 230 KB)

This initial set of findings with English participants is currently being run with folks in es, ja, and ar (see T328616) with amendments to the UI and username format being tested, which should provide more info about whether having a localisable/english word as part of the name could help as well for legibility.

See also T332805#8719355: Having any letters (or other words in specific language) is a I18n burden since temporary accounts are global (connected to SUL) and works across wikis.

I don't think that raw length should be the primary consideration. As eight digits is a reasonably foreseeable number of accounts, we might want to consider formatting the usernames for readability:

User:!1234-5678
User:2023-1234-5678
`

The year prefix contains relevant meaning that might be useful for future editors ("Oh, that temporary account was three years ago. There's no point in trying to contact them...").

Also, if you can type the "1234" part of the username, you'll be able to type the "2023" part. It won't be auto-incorrected to ¡ if your device is trying to "help" you type in Spanish.

Relevant to this discussion is recent findings from Growth team user testing on how well unregistered editors understand temporary accounts, with questions posed asking their understanding of the name format and preferences. There was a bit of confusion in the initial *YY-##.### format used (previous * prefix, * two-digit YY, period thousand separator for the incrementing number). Many participants thought that it was their IP address being used, but partially obscured, it seems due to the asterisk and the dot separators.

When shown a few other formats and asked to rate their preference, it was clear that something with a lexical prefix was preferred - in the test *Unregistered-23.15498 or *Temp-23.15498 were clearly top choice.

image.png (850×1 px, 230 KB)

This initial set of findings with English participants is currently being run with folks in es, ja, and ar (see T328616) with amendments to the UI and username format being tested, which should provide more info about whether having a localisable/english word as part of the name could help as well for legibility.

See also T332805#8719355: Having any letters (or other words in specific language) is a I18n burden since temporary accounts are global (connected to SUL) and works across wikis.

Yes, this is a consideration that we are looking at in some unregistered editor usability testing - showing a number of different formats - only with symbols and Western Arabic numbers, vs with localised and English lexical prefix/suffix.
@Mraish has wrapped up some usability testing with Spanish users where it seems the ~ +YYYY year prefix format tested quite well in addressing the confusion and feedback from the previous rounds, and did as well as with word labels. The four year format seems to make it super clear to participants that it is a random/non-IP related name.

image.png (1×2 px, 641 KB)

Will post info from more testing with Arabic and Japanese folks as relevant.

Is anyone talking/thinking about how these names are going to look in regular use? I was reading the updates here and couldn't help but think about a discussion thread or revision history where you might see activity from different temp users and accidentally group it together, like:

  • <<Totally nice reasonable thing>> ~~~~ !2023-12345
    • <<Really Nasty Mean thing>> ~~~~ !2023-12355
      • Hey, @!2023-12345 this is your last warning before we block your account ~~~~ SomeAdmin

I wonder if some more randomness in the layout of the temp numbers would help, like !2023-12-355 and !2023-123-45

Is anyone talking/thinking about how these names are going to look in regular use? I was reading the updates here and couldn't help but think about a discussion thread or revision history where you might see activity from different temp users and accidentally group it together, like:

  • <<Totally nice reasonable thing>> ~~~~ !2023-12345
    • <<Really Nasty Mean thing>> ~~~~ !2023-12355
      • Hey, @!2023-12345 this is your last warning before we block your account ~~~~ SomeAdmin

I wonder if some more randomness in the layout of the temp numbers would help, like !2023-12-355 and !2023-123-45

Honestly I found it difficult to spot the differences between the two usernames in your proposal too. I do think this is a topic that needs more discussion though. Personally, I would rather that the temporary usernames look as different as possible from eachother, and that there's a very slim likelihood of a temporary username looking similar to another. If a GUID wasn't so long for this use case, I'd have recommended that.

We asked an open question for the community on the IP Masking project page and we received feedback from about 10 people. Here's my summary of the feedback we received:

  • 2 people said that they feel "~" is a good representation of the temporary nature of temp accounts and we should use that as a prefix. Some others used it in their examples which may indicate they are fine with this.
  • 2 people said that "~" prefix could cause confusion because of SUL using them too
  • Concern against ! was that it is typically used to indicate "not"
  • Concern against '?' was that it is used in other ways that could be confusing
  • It is hard to distinguish between an mash, dash, hyphen and minus sign
  • Some people said that they are concerned about how long the usernames will get. There is support for using the year as a prefix, not just to shorten usernames but also to add a data point about the user.
  • Several people supported the idea of breaking down temp usernames for easy readability such as ~2023~4024~1239
  • Several people voiced enthusiasm for having year as prefix - both for identifying temp accounts and for determining how old a temp account may be when interacting with it

Quoting Mike and Rita's research (detailed slide deck):

The year prefix contains relevant meaning that might be useful for future editors ("Oh, that temporary account was three years ago. There's no point in trying to contact them...").

Evidence from the recent Spanish-language user tests (in which the sample temp account name is being shown as ~2023-15498) indicates that people do latch on to the 4-digit year as a source of meaning. Generally a larger proportion of testers are interpreting ~2023-15498 as randomly generated in some way, and a much smaller proportion are interpreting that it has anything to do with their personal IP address.

and

Relevant to this discussion is recent findings from Growth team user testing on how well unregistered editors understand temporary accounts, with questions posed asking their understanding of the name format and preferences. There was a bit of confusion in the initial *YY-##.### format used (previous * prefix, * two-digit YY, period thousand separator for the incrementing number). Many participants thought that it was their IP address being used, but partially obscured, it seems due to the asterisk and the dot separators.

When shown a few other formats and asked to rate their preference, it was clear that something with a lexical prefix was preferred - in the test *Unregistered-23.15498 or *Temp-23.15498 were clearly top choice.

image.png (850×1 px, 230 KB)

This initial set of findings with English participants is currently being run with folks in es, ja, and ar (see T328616) with amendments to the UI and username format being tested, which should provide more info about whether having a localisable/english word as part of the name could help as well for legibility.

Proposal: Based on all of the above, my suggestion would be:

  • To go with YYYY~temp~ as a prefix. Adding the year and "temp" (localized in the language of the user/wiki) would effectively convey the nature of the account even to those who are not familiar with the notion of temporary accounts.
  • For better readability it would be helpful to break the numerical string with a character separator (tilde or hyphen).

Does this sound like an acceptable solution? Am I missing something?

Proposal: Based on all of the above, my suggestion would be:

  • To go with YYYY~temp~ as a prefix. Adding the year and "temp" (localized in the language of the user/wiki) would effectively convey the nature of the account even to those who are not familiar with the notion of temporary accounts.
  • For better readability it would be helpful to break the numerical string with a character separator (tilde or hyphen).

Does this sound like an acceptable solution? Am I missing something?

I suggest avoiding anything that needs localization since usernames are global. Seeing temp in a language you don't know as part of the username will not be effective. There are also issues with choosing the language:

  • Wiki's language: If you just pick the language of the wiki the account is initially created on it may have a language other than that of the primary wiki the user will contribute to. (Example: The first contribution is to upload a file to Commons for an article they will write on the Spanish Wikipedia would be English instead of Spanish.)
  • User's language: Temp users don't have preferences (even if they did, they wouldn't be set at creation time), so you'd have to use what the browser thinks their language is, which may not be accurate or align with the language of the wiki(s) they will contribute to.

Other than the year, I would avoid groupings of numbers less than 5 characters and those could be seen as similar to IP addresses.

~YYYY~nnnnn~nnnnn~nnnnn would be my suggested format.

An emoji could help convey meaning in a language-independent way. ⏳2023-12345 or something like that.

(Language-dependent is technically infeasible as translations are maintained in Translatewiki but temp username patterns are maintained in site configuration.)

Proposal: Based on all of the above, my suggestion would be:

  • To go with YYYY~temp~ as a prefix. Adding the year and "temp" (localized in the language of the user/wiki) would effectively convey the nature of the account even to those who are not familiar with the notion of temporary accounts.
  • For better readability it would be helpful to break the numerical string with a character separator (tilde or hyphen).

Does this sound like an acceptable solution? Am I missing something?

Another issue is temporary accounts will work across SUL and having word makes temporary accounts difficult to work xwiki.

An emoji could help convey meaning in a language-independent way. ⏳2023-12345 or something like that.

Oppose. emoji is not easy to type in keyboard, and many command line interface does not support emoji. Also in older systems they may only be displayed as a square box.

An emoji could help convey meaning in a language-independent way. ⏳2023-12345 or something like that.

Strong oppose for the reasons @Bugreporter mentioned + as an analyst if I had to figure out how to get an emoji into my query every time I needed to work with temp usernames I would find excuses to never work with user/edit data.

  • Some people said that they are concerned about how long the usernames will get. There is support for using the year as a prefix, not just to shorten usernames [...]

Just a pedantic comment that the year prefix will not shorten user names, and may lengthen them (see T332805#8805045). I don't oppose the year prefix, but I think we should be careful not to cite shortening names as a benefit of using a year prefix.

I'm in support of the four digit year prefix YYYY for both how well it tested for understanding by unregistered editors, as well as providing a clear signifier for patrollers and moderators, as well as helping to indicate when the temp account was made based on year and how high the incrementing number after it.
On that note, I wanted to confirm that part of the proposal is that the incrementing number resets at the start of each year, so for example a temp account made in Dec 31st 2023 might be ~2023-15834, whilst the first temp, account created on Jan 1st 2024 would be ~2024-1. This then helps with shortening names since the increment resets to zero each year.

The numbers aren't incrementing, they are pseudo-random (at least that's how the test setup is currently configured). They don't reset, but with pseudo-random numbers there is no apparent difference anyway.

The numbers aren't incrementing, they are pseudo-random (at least that's how the test setup is currently configured). They don't reset, but with pseudo-random numbers there is no apparent difference anyway.

@Niharika or @Tchanders could you please confirm? This whole time myself and previous AHT designer had been operating under the understanding that it was an incrementing number. If not the case, can it be made so for the benefits mentioned?

I think numeric-only usernames with the year in them are going to be very confusing in signatures where they are shown close to a timestamp, and without the User: prefix:

image.png (71×446 px, 8 KB)

I note that in the usability study that signatures were not part of the test case, however signatures are one of the main places where usernames are seen on wikis.

I guess particularly confusing right at the start of the year, when we'll have signatures like ~2023~123 (talk) 21:06 23 January 2024 (BST)

I think numeric-only usernames with the year in them are going to be very confusing in signatures where they are shown close to a timestamp, and without the User: prefix:

image.png (71×446 px, 8 KB)

I believe this is following overall MVP guidance to have the temp account to be as similar as possible to IP editor, where IP addresses are also a bunch of numbers without "user:" in front of it. However, perhaps if this is considered an issue of confusion, perhaps it could it be amended to add a lexical prefix when used in signatures on Discussion Tools like User:~2023-120?
I think the reason for not including lexical prefix in the actual username in the database is due to it being perceived as somewhat sub-optimal if it is only in English (eg ~2023-Temp12345) for other languages, and complex to localise at the account level?

I note that in the usability study that signatures were not part of the test case, however signatures are one of the main places where usernames are seen on wikis.

This is true, the study was targeted to gauge understanding and main workflows for the unregistered editor, less so for how typically experienced folks would view names on a Talk page.

perhaps it could it be amended to add a lexical prefix when used in signatures on Discussion Tools like User:~2023-120?

Good idea, @matmarex suggested something almost identical just now in our meeting. The label could be anything (e.g. "Unregistered user ~2023~120"), the benefit of "User:" is that it doesn't require any new messages.

Note that this wouldn't solve the problem in other places that usernames are shown, such as history pages or other logs. They would have to be "fixed" individually if we wanted to add more text for temporary usernames.

The numbers aren't incrementing, they are pseudo-random (at least that's how the test setup is currently configured). They don't reset, but with pseudo-random numbers there is no apparent difference anyway.

@Niharika or @Tchanders could you please confirm? This whole time myself and previous AHT designer had been operating under the understanding that it was an incrementing number. If not the case, can it be made so for the benefits mentioned?

It was set to scramble a couple of weeks ago in this patch from the Growth Team: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/938915

@RHo Regarding length, incrementing and resetting the numbers won't make much difference. To run with the example from T332805#9073080:

MonthYear with resetYear without resetSpecial character only
Dec 2023~2023-15000~2023-15000~15000
Jan 2024~2024-1~2024-15001~15001
Dec 2024~2024-15000~2024-30000~30000
Dec 2034~2034-15000~2034-150000~150000

Because of how numbers scale, we use up the short IDs really quickly and the vast majority of users have longer IDs anyway. (The first 9 users have 1 ID digit, the next 100 have 2, the next 1,000 have 3, the next 10,000 have 4, the next 100,000 have 5, etc.) If we have millions of new temporary accounts per year, the vast majority of them will have 6 digit IDs even if we reset at the start of each year.

The much bigger effect on the length is the addition of 4 digits for the year... But there are other good reasons to do this, so I'm not opposing it!

perhaps it could it be amended to add a lexical prefix when used in signatures on Discussion Tools like User:~2023-120?

Note that this wouldn't solve the problem in other places that usernames are shown, such as history pages or other logs. They would have to be "fixed" individually if we wanted to add more text for temporary usernames.

It should be straightforward to do this in other places too, via the Linker which already customizes links for temporary user names

The numbers aren't incrementing, they are pseudo-random (at least that's how the test setup is currently configured). They don't reset, but with pseudo-random numbers there is no apparent difference anyway.

@Niharika or @Tchanders could you please confirm? This whole time myself and previous AHT designer had been operating under the understanding that it was an incrementing number. If not the case, can it be made so for the benefits mentioned?

It was set to scramble a couple of weeks ago in this patch from the Growth Team: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/938915

@Tgr @Urbanecm_WMF - per above, can we reset to not scramble but serial?

perhaps it could it be amended to add a lexical prefix when used in signatures on Discussion Tools like User:~2023-120?

Note that this wouldn't solve the problem in other places that usernames are shown, such as history pages or other logs. They would have to be "fixed" individually if we wanted to add more text for temporary usernames.

It should be straightforward to do this in other places too, via the Linker which already customizes links for temporary user names

Ah this is true. The work to make temp account names displayed to be more easily distinguishable for important patrollers/mods is done already here: T325768: Design the visual look for temp usernames:

image.png (374×1 px, 80 KB)

Given this is already implemented, would having the name appear like so on Talk pages be enough versus adding on the extra text?

image.png (564×1 px, 96 KB)

This would be consistent with the way it is displayed in other pages like history and RC, and reduces the need for creating a standard text label before this type of account only.

The numbers aren't incrementing, they are pseudo-random (at least that's how the test setup is currently configured). They don't reset, but with pseudo-random numbers there is no apparent difference anyway.

@Niharika or @Tchanders could you please confirm? This whole time myself and previous AHT designer had been operating under the understanding that it was an incrementing number. If not the case, can it be made so for the benefits mentioned?

It was set to scramble a couple of weeks ago in this patch from the Growth Team: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/938915

@Tgr @Urbanecm_WMF - per above, can we reset to not scramble but serial?

We not only use scramble as of now; we also use multiple shards of the serial provider (aka multiple sources of incrementing integers, where each time a source is selected randomly). The way how this works is that each number source generates every Nth number (if we have three of them, the first one generates numbers 1, 4, 7, ..., the second numbers 2, 5, 8, ... and the third one numbers like 3, 6, 9, ...). This means that if we switched back to serial, the temporary account names probably wouldn't form a perfectly incrementing sequence. The following could be a perfectly valid sequence of temporary account names:

  • *Unregistered 1
  • *Unregistered 4
  • *Unregistered 2
  • *Unregistered 5
  • *Unregistered 3

Scrambling takes this a level up and makes the account names seemingly random. Unfortunately, merely switching back to serial wouldn't give us a perfectly incrementing series of account names, as illustrated above. I'm not sure how scrambling contributes to the interpretation: for big numbers, users probably won't see minor ordering hiccups, unless they're by an order of magnitude wrong.

I'm not really sure about the technical reason for switching to scrambling. About switching to multiple shards of the serial provider, my assumption is that using only one shard would put a lot of burden on a single counter shared across all wikis. The counter can't be really made local to each wiki, as we need to ensure the generated usernames are unique across all wikis (temp accounts can switch between projects, retaining the same temp account, just as regular users do, so we need to "reserve" their name on all projects).

@Tgr and @tstarling (who originally suggested switching to scrambling and increasing the shard count on the patch), please correct me if I'm mistaken in any part of the comment above.

It should be straightforward to do this in other places too, via the Linker which already customizes links for temporary user names

This is very helpful. It seems there should only be two places we need to change then: Linker and signature generation.

Given this is already implemented, would having the name appear like so on Talk pages be enough versus adding on the extra text?

image.png (564×1 px, 96 KB)

This would be consistent with the way it is displayed in other pages like history and RC, and reduces the need for creating a standard text label before this type of account only.

This won't happen automatically as signatures just generate plain wikitext links, e.g. [[User:~2023~123|~2023~123]] which are not rendered by the Linker class.

And in order to keep the wikitext output relatively clean, we will likely not want add the CSS class required to render the link with a different background, as we've done in Linker.

In the other direction it should be much more straightforward, i.e. if we agree on a text prefix for signatures (e.g. "Unregistered ~2023~123") then copying that to Linker should be fairly trivial.

It should be straightforward to do this in other places too, via the Linker which already customizes links for temporary user names

This is very helpful. It seems there should only be two places we need to change then: Linker and signature generation.

Given this is already implemented, would having the name appear like so on Talk pages be enough versus adding on the extra text?

image.png (564×1 px, 96 KB)

This would be consistent with the way it is displayed in other pages like history and RC, and reduces the need for creating a standard text label before this type of account only.

This won't happen automatically as signatures just generate plain wikitext links, e.g. [[User:~2023~123|~2023~123]] which are not rendered by the Linker class.

And in order to keep the wikitext output relatively clean, we will likely not want add the CSS class required to render the link with a different background, as we've done in Linker.

In the other direction it should be much more straightforward, i.e. if we agree on a text prefix for signatures (e.g. "Unregistered ~2023~123") then copying that to Linker should be fairly trivial.

Fair enough.In that case is my assumption correct that this would be a localisable text prefix, as was indicated as preferred most by unregistered testers? If so, that works well but should this be separated into a different task and also documented in case other features/extensions/etc want to adopt the same text prefix.

I've created two other tasks to discuss the order of the temporary accounts (scramble/serial) and whether we need a prefix and if so what here:

Let's take those discussions to those tasks.

Coming back to the matter of the temporary username format itself:

Proposal: Based on all of the above, my suggestion would be:

  • To go with YYYY~temp~ as a prefix. Adding the year and "temp" (localized in the language of the user/wiki) would effectively convey the nature of the account even to those who are not familiar with the notion of temporary accounts.
  • For better readability it would be helpful to break the numerical string with a character separator (tilde or hyphen).

Does this sound like an acceptable solution? Am I missing something?

I suggest avoiding anything that needs localization since usernames are global. Seeing temp in a language you don't know as part of the username will not be effective. There are also issues with choosing the language:

  • Wiki's language: If you just pick the language of the wiki the account is initially created on it may have a language other than that of the primary wiki the user will contribute to. (Example: The first contribution is to upload a file to Commons for an article they will write on the Spanish Wikipedia would be English instead of Spanish.)
  • User's language: Temp users don't have preferences (even if they did, they wouldn't be set at creation time), so you'd have to use what the browser thinks their language is, which may not be accurate or align with the language of the wiki(s) they will contribute to.

Other than the year, I would avoid groupings of numbers less than 5 characters and those could be seen as similar to IP addresses.

~YYYY~nnnnn~nnnnn~nnnnn would be my suggested format.

Thanks @JJMC89. We came to a similar conclusion after looking into the technical feasibility of including a localizable lexical prefix. Hence it makes sense to not include "temp" or anything else that needs to be localized.
The idea of grouping numbers into groups of 5 sounds fine to me. If the numbers don't neatly divide into groups of 5 then the very last group on the right can have fewer numbers.
Proposed format: ~2023-23126-08614-44

Are there any major concerns about this format? Does this seem like an acceptable way forward?

Are there any major concerns about this format? Does this seem like an acceptable way forward?

Looks good to me.

We had about 7M anonymous enwiki edits in the last 12 months; probably an order of magnitude more for all wikis. So more realistically, the numbers will look something like ~2023-23126-086 (unless we zero-pad them, in which case all usernames will be mostly zeros).

Niharika claimed this task.

Follow-up work in T345855: Update temporary username format . Thanks everyone for the lively discussion!