Page MenuHomePhabricator

Special:AbuseLog is missing the `mw-tempuserlink` class from temporary account user links
Closed, ResolvedPublic

Description

Since T326414: Update Linker::userLink to allow identification of temporary account usernames and provide some context, temporary account user name links should have the mw-tempuserlink class.

However, in Special:AbuseLog this doesn't always happen. Examples: https://de.wikipedia.beta.wmflabs.org/wiki/Spezial:Missbrauchsfilter-Logbuch (temporary user links have the class mw-anonuserlink instead).

Oddly, the correct class was added when testing with a local developer installation. See T326414#8569905 for more details.

Event Timeline

It might be because those temporary users are not in the actor or user tables.

Assuming that we always assign temporary user names sequentially, there would appear to be lots of temporary users accounts which are not recorded in the database on beta dewiki:

MariaDB [dewiki]> SELECT user_registration, user_name FROM user WHERE user_name LIKE "%Unregistered%";
+-------------------+--------------------+
| user_registration | user_name          |
+-------------------+--------------------+
| 20220927135438    | *Unregistered 5    |
| 20220927142526    | *Unregistered 6    |
| 20220930161330    | *Unregistered 61   |
| 20221006032301    | *Unregistered 153  |
| 20221007154556    | *Unregistered 175  |
| 20221009101341    | *Unregistered 189  |
| 20221011120906    | *Unregistered 213  |
| 20221012203059    | *Unregistered 238  |
| 20221102110246    | *Unregistered 492  |
| 20221107081036    | *Unregistered 588  |
| 20221116163811    | *Unregistered 799  |
| 20221118063345    | *Unregistered 808  |
| 20221118063742    | *Unregistered 809  |
| 20221118064049    | *Unregistered 810  |
| 20221118184139    | *Unregistered 827  |
| 20221118202741    | *Unregistered 829  |
| 20221122011719    | *Unregistered 869  |
| 20221122145653    | *Unregistered 878  |
| 20221130141418    | *Unregistered 1062 |
| 20221130143922    | *Unregistered 1065 |
| 20221203145718    | *Unregistered 1212 |
| 20221204074958    | *Unregistered 1224 |
| 20221205174035    | *Unregistered 1246 |
| 20221206170217    | *Unregistered 1372 |
| 20221212171236    | *Unregistered 1460 |
| 20221219132318    | *Unregistered 1535 |
| 20221219194042    | *Unregistered 1537 |
| 20230109042528    | *Unregistered 2271 |
| 20230111165003    | *Unregistered 2335 |
| 20230111181333    | *Unregistered 2338 |
| 20230111181511    | *Unregistered 2339 |
| 20230112124223    | *Unregistered 2360 |
| 20230112144908    | *Unregistered 2363 |
| 20230112182938    | *Unregistered 2367 |
| 20230112203136    | *Unregistered 2372 |
| 20230118114043    | *Unregistered 2505 |
| 20230118163735    | *Unregistered 2514 |
| 20230118165230    | *Unregistered 2518 |
| 20230127151034    | *Unregistered 2761 |
| 20230127151941    | *Unregistered 2762 |
| 20230127152808    | *Unregistered 2763 |
| 20230127161919    | *Unregistered 2764 |
| 20230131075426    | *Unregistered 2798 |
| 20230131080508    | *Unregistered 2799 |
+-------------------+--------------------+

@Ladsgroup Would you know why this might be, e.g. are we deleting users on beta to keep the tables small or something?

@Ladsgroup Would you know why this might be, e.g. are we deleting users on beta to keep the tables small or something?

Not to my knowledge.

I see the class though: mw-userlink mw-anonuserlink. If you are wondering why there are some IPs there still, I assume trying to make an edit with abuse filter which doesn't go through so the user doesn't get created and thus the IP is saved instead?

It might be because those temporary users are not in the actor or user tables.

Assuming that we always assign temporary user names sequentially, there would appear to be lots of temporary users accounts which are not recorded in the database on beta dewiki:

That is intentional. In order to avoid race conditions and similar issues, a user id is picked from a sharding system plus some randomness.

I see the class though: mw-userlink mw-anonuserlink. If you are wondering why there are some IPs there still, I assume trying to make an edit with abuse filter which doesn't go through so the user doesn't get created and thus the IP is saved instead?

I raised T328403 earlier today.

That is intentional. In order to avoid race conditions and similar issues, a user id is picked from a sharding system plus some randomness.

Oh, interesting, I didn't know that.

It might be because those temporary users are not in the actor or user tables.

Assuming that we always assign temporary user names sequentially, there would appear to be lots of temporary users accounts which are not recorded in the database on beta dewiki:

That is intentional. In order to avoid race conditions and similar issues, a user id is picked from a sharding system plus some randomness.

I did not know this either. To clarify - you are saying that it is possible to have *Unregistered 1 and *Unregistered 4 in the database with *Unregistered 2 and *Unregistered 3 never having been created?
My follow up question then is:

  1. Can we determine and/or define how fast these numbers will grow? Going by Dom's data in T328311#8573392 it seems like the numbers are getting larger rather rapidly. At this pace we will end up with very large numbers to indicate temporary users very quickly. This can be detrimental to the end user experience.
  2. How do we assign user IDs to new users? Do we deploy a similar mechanism when choosing user IDs?

It might be because those temporary users are not in the actor or user tables.

Assuming that we always assign temporary user names sequentially, there would appear to be lots of temporary users accounts which are not recorded in the database on beta dewiki:

That is intentional. In order to avoid race conditions and similar issues, a user id is picked from a sharding system plus some randomness.

I did not know this either. To clarify - you are saying that it is possible to have *Unregistered 1 and *Unregistered 4 in the database with *Unregistered 2 and *Unregistered 3 never having been created?

Yes.

My follow up question then is:

  1. Can we determine and/or define how fast these numbers will grow? Going by Dom's data in T328311#8573392 it seems like the numbers are getting larger rather rapidly. At this pace we will end up with very large numbers to indicate temporary users very quickly. This can be detrimental to the end user experience.

I need to look at the code and get you some numbers. It is probably adjustable and possibly won't happen that often on production cases because users get created in short span of time (the whole reason this sharding was implemented for) so the gaps will be much smaller. I double check and get back to you soon.

  1. How do we assign user IDs to new users? Do we deploy a similar mechanism when choosing user IDs?

We have two things here, temp user id and user id of the temp user (confusing, I know).

Temp user id is what's shown to the user and uses sharding and can grow rapidly but showing it to the user (=the username) can be adjusted, for example it can use a hexadecimal notation which would reduce the number of digits from seven (a rough estimate of the id the user that get created after thirty days gets, even if we don't put any sharding in place) to five (and letters making it more readable, e.g. from *Unregistered 1000000 for the username, you'd get *Unregistered F4240). You can go in the direction of short url and have a mapping (https://w.wiki/6ah "6ah" is actually a number in the database) but there is the risk of terrible names showing up as the username.

User id of the temp user is a different story, it is different per wiki (and there is global id as well) but all are internal and won't change and won't be shown to the user.

Okay, I looked at it in depth and the sharding couldn't have caused these jumps (sorry, I was wrong). For two reasons: 1- The sharding is disabled by default and I'm not finding any config that enables it for the test wiki 2- The way that sharding works makes it that you will have temp user 4 before temp user 3 but no gaps.

Still the problem of long number stands, we easily will reach seven digits in a month in English Wikipedia so it's worth exploring hexadecimal or other ways.

Figuring out what has been causing these jumps is also important, my guess is that we assign an id if the user opens edit page but throw it away if no edit is saved which can break easily by bots and such but that's a completely unbased guess.

@Niharika We don't need to be too worried about how name lengths scale, even with these gaps.

Assuming 1 million ID increments per month, here's how decimal will scale:

  • 7 digits in 1 month
  • 8 digits in 10 months
  • 9 digits in 100 months (>4 years)
  • 10 digits in 1000 months (>40 years)
  • 11 digits in 10000 months (>400 years)

If we've underestimated by a factor of 10, just add 1 to each of those numbers. If by a factor of 100, add 2. Etc.

If those names are too long, here's how hexadecimal will scale:

  • 5 digits in 1 month
  • 6 digits in 16 months
  • 7 digits in 256 months (>10 years)
  • 8 digits in 4096 months (>170 years)

I notice that when I edit with an anonymous user using DiscussionTools (e.g. https://de.wikipedia.beta.wmflabs.org/wiki/Benutzer_Diskussion:Dom_walden) which is prevented by an AbuseFilter the log does not contain .mw-tempuserlink. See the second entry in https://de.wikipedia.beta.wmflabs.org/w/index.php?title=Spezial:Missbrauchsfilter-Logbuch&wpSearchFilter=156.

I also notice that the /examine for that entry (https://de.wikipedia.beta.wmflabs.org/wiki/Spezial:Missbrauchsfilter/examine/log/20489) has user_editcount=null and user_age=0, which suggests it is being treated like an anonymous user even though the user_name=*Unregistered 3916.

Figuring out what has been causing these jumps is also important, my guess is that we assign an id if the user opens edit page but throw it away if no edit is saved which can break easily by bots and such but that's a completely unbased guess.

At the moment it's happen only when doing a preview, so just open the edit form and looking at the source does not aquire a new id, but pressing the "preview" or "show changes" button does (when the livepreview is not used for anons).
Even a preview of the empty string triggers the next number and sets its to the session for reuse on the next edit.
When other actions (available for anons) gets added to the temp user creation this actions could also trigger new numbers.

Too avoid some "challenges" on getting higher numbers a serial number should be used, where the representation of the number is not just increment on the last byte, but looks more random for users (I do not know if that is easy to reach without produce duplicates). This also hides the fact of sharding for big wikis.

A serial representation also avoids that the creation date is easy to spot from the number (the creation date is still shown on Special:ListUsers or via the api, but that is another place and maybe also something to discuss) for example within a discussion. Creation date is not the time of the first save, it is the time of first preview and maybe should not shown up for privacy (but that sounds like a discussion for another task).

To fix this task the class mw-tempuserlink must be added for userId == 0 in Linker class as well.

Change 915828 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] linker: Add mw-tempuserlink also for unsaved temp users

https://gerrit.wikimedia.org/r/915828

Change 915828 merged by jenkins-bot:

[mediawiki/core@master] linker: Add mw-tempuserlink also for unsaved temp users

https://gerrit.wikimedia.org/r/915828

I didn't find any examples of temporary users not having the mw-tempuserlink class on Special:AbuseLog when I checked the most recent ~3000 entries on beta dewiki and ~2000 on beta enwiki.

I also checked that no other types of users had the mw-tempuserlink class, that all IPs had the mw-anonuserlink class and that non-IPs did not have that class.

@Umherirrender I notice that some regular usernames have the mw-anonuserlink class. This includes imported edits (e.g. https://de.wikipedia.beta.wmflabs.org/w/index.php?title=Diadie_Samass%C3%A9kou&action=history) and AbuseLog entries where the account does not exist because the filter has prevented its creation (e.g. ThaliaTomaszewsk here https://de.wikipedia.beta.wmflabs.org/wiki/Spezial:Missbrauchsfilter-Logbuch). Is this expected? It appears to happen on production enwiki as well, so I assume it is. I guess the username will not have an ID if it is not in the database.

I then checked user links on a number of different places where they appear, including:

  • Special:Log
  • Special:RevisionDelete
  • Special:CheckUserLog
  • Special:BlockList
  • Revision history
  • Special:RecentChanges
  • page credits
  • page info
  • diff

Test environments:

When there is no user id it always gets mw-anonuserlink (code links T45179 here), that is also done for imported user names. Imported user names gets also mw-extuserlink, code review on 96bd79b4a36a7216dce4ad8b5915d592ba1dff8b does not mention anything about that it is wanted or unwanted to style externals like anons.