Page MenuHomePhabricator

How do we log unsuccessful first edits for temporary users?
Open, Needs TriagePublic

Description

If someone attempts to make a first edit, while not logged in, but the edit is unsuccessful due to log-worthy reasons (e.g., [[Special:AbuseFilter]] prevents the edit), then what name goes in the log? Should the temporary account get created anyway, just with zero completed edits?

Thanks to @Tigerzeng for asking these questions:

  1. Will unregistered users get their temporary accounts when they trigger abusefilters but not actually make edits to pages?
  2. Will users have access to IP addresses of temporary accounts be able to check IPs of unregistered users who trigger abusefilters?

Related Objects

Event Timeline

  1. Will unregistered users get their temporary accounts when they trigger abusefilters but not actually make edits to pages?

We should decide in this task.

  1. Will users have access to IP addresses of temporary accounts be able to check IPs of unregistered users who trigger abusefilters?

If the temporary account already exists, then they will show up in Special:AbuseLog. We can add IP reveal to that page. To get the specific IP address that was used when that filter was triggered, we'd need some ID in the HTML that can be linked back to an IP stored in a CheckUser table. If that wasn't available, we'd instead be able to reveal the last IP used by that temp account.

If the temporary account doesn't already exist, this would only be possible by creating a temporary account when a filter is triggered - i.e. answering "yes" to (1).

A similar question was raised for unsuccessful login attempts in CheckUser: T353953: Don't use actor IDs for private CheckUser events when these actions are performed by IP addresses. The solution may need to be different here, but mentioning in case we can learn anything from that.

STran added subscribers: Dreamy_Jazz, Daimona.

Discussed this with @kostajh @Daimona @Dreamy_Jazz @Tchanders and we came to the consensus that it would be ideal if we could just create the temporary account on the action even though there won't be any edits associated with the account, as it maintains the stance that all actions should be associated with some sort of account. We were hoping to clear this with DBAs, as it's possible that accounts added here would never be valid end users (eg. LTAs triggering filters). I am vaguely aware of some discussion of table bloat but don't know where it is specifically so I'm not sure what the constraints are.

For additional context, here are some queries run this week approximating how many accounts we'd expect to add:

In a scenario where every filter hit triggers an account creation:

wikiadmin2023@10.192.48.200(enwiki)> select COUNT( DISTINCT afl_user_text ) from abuse_filter_log where afl_user = 0 and afl_timestamp > 20240101010101;
+---------------------------------+
| COUNT( DISTINCT afl_user_text ) |
+---------------------------------+
|                          112649 |
+---------------------------------+
1 row in set (9.840 sec)

This is ~55k accounts a month.

If only edits are considered:

mysql:research@dbstore1008.eqiad.wmnet [enwiki]> select COUNT(DISTINCT afl_ip) from abuse_filter_log where afl_user = 0 and afl_timestamp > 20240101010101 and afl_action = 'edit';
+------------------------+
| COUNT(DISTINCT afl_ip) |
+------------------------+
|                 104813 |
+------------------------+
1 row in set (47.503 sec)

This is ~52k accounts a month.

If only edits that were disallowed are considered:

wikiadmin2023@10.192.48.200(enwiki)> select COUNT( DISTINCT afl_user_text ) from abuse_filter_log where afl_user = 0 and afl_timestamp > 20240101010101 and afl_action="edit" and afl_actions = "disallow";
+---------------------------------+
| COUNT( DISTINCT afl_user_text ) |
+---------------------------------+
|                           29066 |
+---------------------------------+
1 row in set (15.495 sec)

This is ~15k accounts a month.


I did some additional research while reviewing T358632: CannotCreateActorException when creating a temporary account on an edit which causes an abuse filter log which resolved the issue of the actor not existing and throwing an error whenever an anonymous user attempted an edit that triggered a filter. Once that error was fixed, the current situation actually seems less breaking than expected:

  • An anonymous user, upon making an edit that flags a filter, will be logged as a temporary user but no temporary account will have been made eg:

image.png (602×1 px, 64 KB)

  • An anonymous user makes an edit that is logged but not disallowed will have a temporary account created for them. If the event is logged (eg flagged), at the time of the log, the account does not exist. Attempts to show the IP via IPInfo's "Show IP" button will fail (because there is no edit to look up). However, AbuseFilter log will have still logged the IP and this IP goes away when the abuse_filter_log table clears its older IPs.
  • If a still-anonymous user proceeds to make a valid edit, the account is then created using the username associated with the failed action.
  • If a different anonymous user makes an edit, failed or otherwise, they will receive the next available username (so users aren't wrongly associated with another user's actions).

This isn't the worst situation to be in, but it does create this oddity where the username for a temporary account exists but no real account exists. Maybe this is okay? If the user never makes a valid edit with that account (afaik as long as they have the cookie they have the username and if the cookie goes away then so does that account name) then it doesn't necessarily matter if an account under that name ever exists. It's just a weird precedent.

In this scenario, we would probably still have to fix the broken "Show IP" button.

This isn't the worst situation to be in, but it does create this oddity where the username for a temporary account exists but no real account exists. Maybe this is okay? If the user never makes a valid edit with that account (afaik as long as they have the cookie they have the username and if the cookie goes away then so does that account name) then it doesn't necessarily matter if an account under that name ever exists. It's just a weird precedent.

It is indeed quite weird. I wonder what would it take to fix this (in core), by having the temp account creation fully go through, even if the edit itself has been disallowed. This would be less surprising, and also allow easier browsing of the user's abuselog, as well as IPInfo as you mentioned.

Thanks @STran and @Daimona. Looking at EditPage, I think it would be tricky to create the temp account in core if the filter fails.

Here's what broadly happens in EditPage::internalAttemptSave:

  • If we couldn't acquire a temp user name, return early; otherwise we have an unsaved temp user (a User object that has no DB representation)
  • Check through a long list of constraints, including running the EditFilterMergedContent hook, which AbuseFilter uses to possibly disallow the edit
  • Iff all of these constraints pass, create the temp user

If we create the temp account before checking the constraints, I think we'd have a lot more than just the extra 15k/month from T334623#9587082 (e.g. other constraints look at the structure of the edit, or whether the user is blocked).

If we only want to create the temp account in the case that AbuseFilter disallows the edit, it would be difficult to communicate to core that AbuseFilter was one of the constraints that failed. I guess we'd need some sort of PossiblyPreventEditButAllowTempAccountCreation hook.

Other approaches

  • AbuseFilter itself could save the temp user. This would result in the ~15k accounts a month, so would need DBA approval.
  • We could stop using an unsaved temp user in EditPage (T355210), although that would result in an IP user being logged so we'd need to handle that in AbuseFilter (tested using this patch).

@Tchanders Good point. This naturally raises another question though: what happens when the edit is prevented by other things that also generate logs? SpamBlacklist is the first example that comes to mind. Broadly speaking, given the large number of existing constraint, I think it would not be unreasonable to assume that something other than AF and SB may disallow an edit while also persisting data about the temp account somehow (logs being the main example). And even if AF and SB were the only existing use cases, it seems like nothing is preventing developers from following the same pattern elsewhere.

Not that this makes anything easier, but at least I could see the idea of distinguishing between "block edit, block temp account creation" and "block edit, allow temp account creation" useful for more things than just AF. Still, as you say, it would be tricky. Creating the account in AF itself seems like it might be the best option wrt to cost/effectiveness. But I should also note that there'd still be edge cases: if, for example, AF logs a temp account edit without disallowing it, and then the edit is disallowed by something else (another constraint), AF would still produce orphaned log entries for a temp account that doesn't really exist.

But I should also note that there'd still be edge cases: if, for example, AF logs a temp account edit without disallowing it, and then the edit is disallowed by something else (another constraint), AF would still produce orphaned log entries for a temp account that doesn't really exist.

Ah right, so we could be looking at up to ~55k/month unused temp accounts in that case, if we let AF create the account.

Another thing to consider is rate limiting. If we go for any of the approaches that involve temp account creation, we'd need to be careful that the temp account was immediately expired (by doing this), or else triggering a filter would be a way to get around the rate limit. (@kostajh might be interested in this.)

This naturally raises another question though: what happens when the edit is prevented by other things that also generate logs?

It sounds like it would be useful to decide on a general approach to logging something that was done by an anonymous user before a temp account could be created. Looks like SpamBlacklist tries to log the unsaved temp user's name, causing an error in core (T358806), and potentially other features could encounter similar problems too.

we came to the consensus that it would be ideal if we could just create the temporary account on the action even though there won't be any edits associated with the account, as it maintains the stance that all actions should be associated with some sort of account.

I like this principle (any loggable action should be associated with a saved temporary account) because it's easy to understand, clarifies ambiguities, and should apply to other software as well (e.g. SpamBlacklist).

Another thing to consider is rate limiting. If we go for any of the approaches that involve temp account creation, we'd need to be careful that the temp account was immediately expired (by doing this),

I'd like to understand this better, as I am not sure why we would want to immediately expire the account. It should be possible to trip and AbuseFilter (temp account created; edit not saved) and then succeed a second time.

or else triggering a filter would be a way to get around the rate limit. (@kostajh might be interested in this.)

I don't think it would allow the user to bypass the rate limits, AIUI. It does mean that bad faith editing attempts could easily use up the temp account rate limit quota for the day for an IP address, shutting out good faith anonymous editors.

Another thing to consider is rate limiting. If we go for any of the approaches that involve temp account creation, we'd need to be careful that the temp account was immediately expired (by doing this), or else triggering a filter would be a way to get around the rate limit. (@kostajh might be interested in this.)

I would be concerned that by expiring the temporary account could cause a new creation on a new failed abuse filter hit, which would then fill up the temporary account creation throttle quickly.

Another thing to consider is rate limiting. If we go for any of the approaches that involve temp account creation, we'd need to be careful that the temp account was immediately expired (by doing this),

I'd like to understand this better, as I am not sure why we would want to immediately expire the account. It should be possible to trip and AbuseFilter (temp account created; edit not saved) and then succeed a second time.

or else triggering a filter would be a way to get around the rate limit. (@kostajh might be interested in this.)

I don't think it would allow the user to bypass the rate limits, AIUI. It does mean that bad faith editing attempts could easily use up the temp account rate limit quota for the day for an IP address, shutting out good faith anonymous editors.

Another thing to consider is rate limiting. If we go for any of the approaches that involve temp account creation, we'd need to be careful that the temp account was immediately expired (by doing this), or else triggering a filter would be a way to get around the rate limit. (@kostajh might be interested in this.)

I would be concerned that by expiring the temporary account could cause a new creation on a new failed abuse filter hit, which would then fill up the temporary account creation throttle quickly.

Sorry I should clarify what I meant in this paragraph!

TempUser::create can be called with or without rate limiting (determined by whether you pass in the WebRequest param).

  • If we call it with with rate limiting, this could cause the limit to be hit via bad edits alone (as expressed in the comments form @kostajh and @Dreamy_Jazz)
  • If we call it without rate limiting, then it could become a way for a user to bypass the rate limit - deliberately trigger the filter from different browsers/devices, then you are in control of lots of temp accounts.

My suggestion to call it without rate limiting, but to expire the temp accounts immediately, was to avoid both of these problem. We'd ensure that the rate limit doesn't get used up by AbuseFilter hits, but at the same time we wouldn't allow a user to become in control of lots of temp accounts.

This would cause a temp account to be created for each failed attempt, but that should only be as many (at worst) as calculated in T334623#9587082.

! In T334623#9600711, @Tchanders wrote:
My suggestion to call it without rate limiting, but to expire the temp accounts immediately, was to avoid both of these problem. We'd ensure that the rate limit doesn't get used up by AbuseFilter hits, but at the same time we wouldn't allow a user to become in control of lots of temp accounts.

By expiring the temporary account immediately it would be difficult to build a picture of failed AbuseFilter hits as each hit would be under a different username and therefore you couldn't see a pattern of abuse.

Blocking a user for repeatedly hitting the abuse filter is a selectable block reason on the English Wikipedia and would make determining this (without getting all temporary accounts on the given IP) not possible.

! In T334623#9600711, @Tchanders wrote:
My suggestion to call it without rate limiting, but to expire the temp accounts immediately, was to avoid both of these problem. We'd ensure that the rate limit doesn't get used up by AbuseFilter hits, but at the same time we wouldn't allow a user to become in control of lots of temp accounts.

By expiring the temporary account immediately it would be difficult to build a picture of failed AbuseFilter hits as each hit would be under a different username and therefore you couldn't see a pattern of abuse.

Blocking a user for repeatedly hitting the abuse filter is a selectable block reason on the English Wikipedia and would make determining this (without getting all temporary accounts on the given IP) not possible.

I think we want to call TempUser::create with rate limiting. For most users, they would trip the abuse filter, be logged-in to a temp account, and future editing attempts would be associated with that existing temp account. For scripted abuse scenarios, that would potentially fill up the rate limit for an IP for the day. I'd like to see how that works out in practice. I will file a task about instrumenting this. I also think that T357802: Prompt user to create a regular account after temp account creation rate limit trip could mitigate the UX for good faith users.

Since this task is about AbuseFilter specifically, I filed T359405: Create temporary account early in edit cycle for all edit attempts as the more general case. There is a fair amount of overlap with this task, though.

One problem with T359405: Create temporary account early in edit cycle for all edit attempts is that there are numerous constraint checks that currently result in no log entries (e.g. page size check, unicode check), and these would now result in a temporary account being created.

One alternative would be to delegate ManualLogEntry to EditPage.php -- if a hook wants to fail an edit and also generate a ManualLogEntry, it could send this information back to EditPage via the Status object (set the ManualLogEntry as the value); EditPage would check if the status is bad and if $status->value is an instance of ManualLogEntry, then EditPage should create a temp account and update the performer of the ManualLogEntry.

One alternative would be to delegate ManualLogEntry to EditPage.php -- if a hook wants to fail an edit and also generate a ManualLogEntry, it could send this information back to EditPage via the Status object (set the ManualLogEntry as the value); EditPage would check if the status is bad and if $status->value is an instance of ManualLogEntry, then EditPage should create a temp account and update the performer of the ManualLogEntry.

The problem with this approach is that extensions can do more than just creating ManualLogEntry objects, and still depend on the existence of the user. As a case in point, the abuselog doesn't use Special:Log or ManualLogEntry at all (T21494). But in general, extensions could really do anything with the user object, including persisting any information about it, and I guess we would want to keep that functional.

Instead, I was thinking about splitting the constraints into two groups, user-independent and user-dependent. Those in the former group would be run first, and before creating the temp account; these constraints would not have access to a User-like object at all, to prevent undefined behaviour. If all those constraints succeed, we would then create the temp account and run the user-dependent subset of constraints; these would work exactly like they do today. The whole Constraints code is pretty much internal, so it should be possible to introduce this distinction without significant breakage. If needed, we could even split the current EditFilterMergedHook constraint into two separate constraint, one that depends on the user and one that doesn't. This could also give us a chance to fix T304238 while at it.

One alternative would be to delegate ManualLogEntry to EditPage.php -- if a hook wants to fail an edit and also generate a ManualLogEntry, it could send this information back to EditPage via the Status object (set the ManualLogEntry as the value); EditPage would check if the status is bad and if $status->value is an instance of ManualLogEntry, then EditPage should create a temp account and update the performer of the ManualLogEntry.

The problem with this approach is that extensions can do more than just creating ManualLogEntry objects, and still depend on the existence of the user. As a case in point, the abuselog doesn't use Special:Log or ManualLogEntry at all (T21494). But in general, extensions could really do anything with the user object, including persisting any information about it, and I guess we would want to keep that functional.

Instead, I was thinking about splitting the constraints into two groups, user-independent and user-dependent. Those in the former group would be run first, and before creating the temp account; these constraints would not have access to a User-like object at all, to prevent undefined behaviour. If all those constraints succeed, we would then create the temp account and run the user-dependent subset of constraints; these would work exactly like they do today. The whole Constraints code is pretty much internal, so it should be possible to introduce this distinction without significant breakage. If needed, we could even split the current EditFilterMergedHook constraint into two separate constraint, one that depends on the user and one that doesn't. This could also give us a chance to fix T304238 while at it.

I agree with you on this. We shouldn't be realistically assuming that a ManualLogEntry is the only thing that could cause problems.

I like your idea of splitting the constraints into user-independent and user-dependent. AFAIK it would make it easier for extensions to define constraints that should prevent the creation of a temporary account (as they can use the user-independent one without knowing specifically who is performing the edit).

There's another issue here, which @Tgr noted in this gerrit comment. If we create the account and fail the edit, we won't perform the top-level navigation that's needed to create an account on loginwiki, and the user would have a detached temp account that can only be used on the local wiki, and not on other wikis.

A potential workaround is to perform top-level navigation on rejected edits. This might be kind of a clunky user experience (users of the wikitext editor would lose their edit contents; for VE, we can probably use the stash and restore mechanism for free), but probably OK for this use case.

@Daimona @Dreamy_Jazz do you have in mind which constraints are user-independent and which are user-dependent?

As I see it, the issue is with onEditPage__attemptSave and onEditFilter hooks. I'd propose we:

  • run the constraints (if fail, no temp account)
  • create the temp account
  • run onEditPage__attemptSave and onEditFilter

@Daimona @Dreamy_Jazz do you have in mind which constraints are user-independent and which are user-dependent?

If we introduce that distinction, we might start by considering every existing constraint as user-dependent, as that should preserve the existing behaviour. Then, for simple constraints it should be just a matter of auditing their implementation to see if they need a user or not. To prevent the context user from being accessed deep down in the call stack, we might explore emitting a warning if something calls RequestContext::getMain()->getUser(). I don't know if that'd work though; otherwise, it would help us validate that a given constraint is (or isn't) user-dependent.

As I see it, the issue is with onEditPage__attemptSave and onEditFilter hooks.

I'm not sure if I understand, what specific issue were you referring to?

@Daimona @Dreamy_Jazz do you have in mind which constraints are user-independent and which are user-dependent?

If we introduce that distinction, we might start by considering every existing constraint as user-dependent, as that should preserve the existing behaviour. Then, for simple constraints it should be just a matter of auditing their implementation to see if they need a user or not. To prevent the context user from being accessed deep down in the call stack, we might explore emitting a warning if something calls RequestContext::getMain()->getUser(). I don't know if that'd work though; otherwise, it would help us validate that a given constraint is (or isn't) user-dependent.

OK, I see.

As I see it, the issue is with onEditPage__attemptSave and onEditFilter hooks.

I'm not sure if I understand, what specific issue were you referring to?

Sorry, I mean that most of the constraints added to EditConstraintRunner don't seem to have any side effects, while onEditPage__attemptSave, onEditFilter and EditFilterMergedContentHookConstraint run hooks, at which point we can't say anything with certainty about what extensions may do.

So, the proposal would be:

  • run all the constraints (e.g. unicode, spamregex, etc) before creating a temp user, and a failure here means no temp user account is created
  • create the temp account
  • run onEditPage__attemptSave, onEditFilter and EditFilterMergedContentHookConstraint hooks
    • Failure here means that the user is logged-in to a temp account
    • run the top-level redirect so that the user's temp account is attached to loginwiki

I like your idea of splitting the constraints into user-independent and user-dependent.

I think these definitions would end up somewhat artificial (why is a page length check implemented in AbuseFilter more user-dependent than the one implemented in PHP?). Why is hitting the spam blacklist user-dependent if $wgLogSpamBlacklistHits is true but not user-dependent when it is false?

Conceptually, creating the temporary account on any edit attempt seems reasonable to me. In practice, creating a temporary account on failed edit attempts seems hard but splitting doesn't really help with that.

Ideally, if you really want to minimize the number of temp users, what you'd want is some sort of mechanism for on-demand creation of the temp user account when the actor is determined (probably in ActorNormalization::acquireActorId()), but it seems like a scary change. On the other hand, it's very similar to the shadow user concept discussed in T21161: Don't autologin if local account doesn't exist (don't autocreate if user doesn't explicitly login) (except that would have even more wide-ranging requirements).

Sorry, I mean that most of the constraints added to EditConstraintRunner don't seem to have any side effects, while onEditPage__attemptSave, onEditFilter and EditFilterMergedContentHookConstraint run hooks, at which point we can't say anything with certainty about what extensions may do.

Ah, yes, you're right. That might also be a viable option.

I think these definitions would end up somewhat artificial (why is a page length check implemented in AbuseFilter more user-dependent than the one implemented in PHP?). Why is hitting the spam blacklist user-dependent if $wgLogSpamBlacklistHits is true but not user-dependent when it is false?

They would be artificial, but it doesn't have to be that strict. A constraint would be user-dependent if it might need to know who the current user is. "page length check implemented in AbuseFilter" is not a constraint in itself; the EditFilterMergedContent hook is, and as a whole, it might need access to the current user. Even if we talk about this on a per-hookhandler basis, AbuseFilter would be considered user-dependent in the sense that sometimes it needs to know who the current user is (depending on the enabled filters); likewise, SpamBlacklist might need to know the user depending on configuration.

The "user-dependent" concept I'm proposing wouldn't be meant as a strict categorization of constraints, but more of a way to make sure that anything which potentially needs a user declares this explicitly, rather than hoping that RequestContext::getMain() (or similar shenanigans) does the right thing for them. As a bonus, since certain constraints declare themselves to be user-independent, we could delay the account creation until after those constraints are run; but this is more of a consequence of the categorization.

That said, I'm not familiar with the temp account creation process, and there are probably more elegant solutions than this :)

I did some additional research while reviewing T358632: CannotCreateActorException when creating a temporary account on an edit which causes an abuse filter log which resolved the issue of the actor not existing and throwing an error whenever an anonymous user attempted an edit that triggered a filter. Once that error was fixed, the current situation actually seems less breaking than expected:

  • An anonymous user, upon making an edit that flags a filter, will be logged as a temporary user but no temporary account will have been made eg:

image.png (602×1 px, 64 KB)

  • An anonymous user makes an edit that is logged but not disallowed will have a temporary account created for them. If the event is logged (eg flagged), at the time of the log, the account does not exist. Attempts to show the IP via IPInfo's "Show IP" button will fail (because there is no edit to look up). However, AbuseFilter log will have still logged the IP and this IP goes away when the abuse_filter_log table clears its older IPs.
  • If a still-anonymous user proceeds to make a valid edit, the account is then created using the username associated with the failed action.
  • If a different anonymous user makes an edit, failed or otherwise, they will receive the next available username (so users aren't wrongly associated with another user's actions).

This isn't the worst situation to be in, but it does create this oddity where the username for a temporary account exists but no real account exists. Maybe this is okay? If the user never makes a valid edit with that account (afaik as long as they have the cookie they have the username and if the cookie goes away then so does that account name) then it doesn't necessarily matter if an account under that name ever exists. It's just a weird precedent.

In this scenario, we would probably still have to fix the broken "Show IP" button.

After working on T359405: Create temporary account early in edit cycle for all edit attempts (see T359405#9649058), I'm wondering if the existing behavior @STran describes above isn't the best approach after all.

The only thing we might want to do there is add a "Show IP" button that works here. But otherwise, there'd be nothing more to do, AFAICT.

I did some additional research while reviewing T358632: CannotCreateActorException when creating a temporary account on an edit which causes an abuse filter log which resolved the issue of the actor not existing and throwing an error whenever an anonymous user attempted an edit that triggered a filter. Once that error was fixed, the current situation actually seems less breaking than expected:

  • An anonymous user, upon making an edit that flags a filter, will be logged as a temporary user but no temporary account will have been made eg:

image.png (602×1 px, 64 KB)

  • An anonymous user makes an edit that is logged but not disallowed will have a temporary account created for them. If the event is logged (eg flagged), at the time of the log, the account does not exist. Attempts to show the IP via IPInfo's "Show IP" button will fail (because there is no edit to look up). However, AbuseFilter log will have still logged the IP and this IP goes away when the abuse_filter_log table clears its older IPs.
  • If a still-anonymous user proceeds to make a valid edit, the account is then created using the username associated with the failed action.
  • If a different anonymous user makes an edit, failed or otherwise, they will receive the next available username (so users aren't wrongly associated with another user's actions).

This isn't the worst situation to be in, but it does create this oddity where the username for a temporary account exists but no real account exists. Maybe this is okay? If the user never makes a valid edit with that account (afaik as long as they have the cookie they have the username and if the cookie goes away then so does that account name) then it doesn't necessarily matter if an account under that name ever exists. It's just a weird precedent.

In this scenario, we would probably still have to fix the broken "Show IP" button.

After working on T359405: Create temporary account early in edit cycle for all edit attempts (see T359405#9649058), I'm wondering if the existing behavior @STran describes above isn't the best approach after all.

The only thing we might want to do there is add a "Show IP" button that works here. But otherwise, there'd be nothing more to do, AFAICT.

So, summarizing the options, as I see them:

1. Do nothing

  • Let AbuseFilter continue to log the reserved username.
    • If the user makes a successful edit, they will be associated with their previous AbuseFilter-logged events
    • Optional: Add a "Show IP" button on Special:Contributions or in AbuseFilter log for temp account names that appear in the log but don't exist as accounts

Advantages:

  • Limited additional work ("Show IP" button). The IP reveal functionality might also be unnecessary in this context for abuse mitigation workflows.

Disadvantages:

  • Need to educate AbuseFilter reviewers as to why temp account names exist in logs without actual existing accounts
  • Won't work for other extensions that need to log an actual existing user as part of edit denial workflows

2. Create account at beginning of internalAttemptSave and don't worry about detached user accounts

  • Create account at beginning of internalAttemptSave, before any hooks/constraints

Advantages:

  • Creates a user for an edit attempt which allows for easier logging constraint checks

Disadvantages:

  • Something on the order of 55k accounts created per month, of which some percentage won't be used at all
  • Poor UX for users who failed to edit (but are actually logged-in). They'll see some form of CentralAuth / session errors, unless we implement some fix around it. They'll also not be able to visibly see that they're logged in on receiving an error (unless we do something about that too)
  • Inability for these users to be seen as the same user when visiting other wikis

3. Create account at beginning of internalAttemptSave and find way to perform top-level login on failure

  • Create account at beginning of internalAttemptSave, before any hooks/constraints
  • Perform a top-level login on failed edits

Advantages:

  • Same as 2, but we also ensure the user is attached to loginwiki, and that there is an improved UX in failure modes

Disadvantages:

  • Something on the order of 55k accounts created per month, of which some percentage won't be used at all
  • Wikitext editors lose their changes on failed attempts (due to top level redirect)
  • Modifying EditPage.php to support parsing error messages into query params for top-level login will be painful and clunky. Same goes for API driven editing.
  • Would need to modify wikitext editor UX and API driven editing (VisualEditor/DiscussionTools/Wikibase) UX to support the top-level redirect workflows on edit failures

It may be the case in the future that CentralAuth's SSO changes (T345249, T348388), in which case option 3 becomes more palatable, and would require less refactoring.

It seems like the least disruptive option here is option 1 (status quo; only create temp users on success, and hooks that fire before success should make use of the reserved temp user name, but not a full account), assuming that it would work for SpamBlacklist (T358806).

I did some additional research while reviewing T358632: CannotCreateActorException when creating a temporary account on an edit which causes an abuse filter log which resolved the issue of the actor not existing and throwing an error whenever an anonymous user attempted an edit that triggered a filter. Once that error was fixed, the current situation actually seems less breaking than expected:

  • An anonymous user, upon making an edit that flags a filter, will be logged as a temporary user but no temporary account will have been made eg:

image.png (602×1 px, 64 KB)

  • An anonymous user makes an edit that is logged but not disallowed will have a temporary account created for them. If the event is logged (eg flagged), at the time of the log, the account does not exist. Attempts to show the IP via IPInfo's "Show IP" button will fail (because there is no edit to look up). However, AbuseFilter log will have still logged the IP and this IP goes away when the abuse_filter_log table clears its older IPs.
  • If a still-anonymous user proceeds to make a valid edit, the account is then created using the username associated with the failed action.
  • If a different anonymous user makes an edit, failed or otherwise, they will receive the next available username (so users aren't wrongly associated with another user's actions).

This isn't the worst situation to be in, but it does create this oddity where the username for a temporary account exists but no real account exists. Maybe this is okay? If the user never makes a valid edit with that account (afaik as long as they have the cookie they have the username and if the cookie goes away then so does that account name) then it doesn't necessarily matter if an account under that name ever exists. It's just a weird precedent.

In this scenario, we would probably still have to fix the broken "Show IP" button.

After working on T359405: Create temporary account early in edit cycle for all edit attempts (see T359405#9649058), I'm wondering if the existing behavior @STran describes above isn't the best approach after all.

The only thing we might want to do there is add a "Show IP" button that works here. But otherwise, there'd be nothing more to do, AFAICT.

So, summarizing the options, as I see them:

1. Do nothing

  • Let AbuseFilter continue to log the reserved username.
    • If the user makes a successful edit, they will be associated with their previous AbuseFilter-logged events
    • Optional: Add a "Show IP" button on Special:Contributions or in AbuseFilter log for temp account names that appear in the log but don't exist as accounts

Advantages:

  • Limited additional work ("Show IP" button). The IP reveal functionality might also be unnecessary in this context for abuse mitigation workflows.

Disadvantages:

  • Need to educate AbuseFilter reviewers as to why temp account names exist in logs without actual existing accounts
  • Won't work for other extensions that need to log an actual existing user as part of edit denial workflows

2. Create account at beginning of internalAttemptSave and don't worry about detached user accounts

  • Create account at beginning of internalAttemptSave, before any hooks/constraints

Advantages:

  • Creates a user for an edit attempt which allows for easier logging constraint checks

Disadvantages:

  • Something on the order of 55k accounts created per month, of which some percentage won't be used at all
  • Poor UX for users who failed to edit (but are actually logged-in). They'll see some form of CentralAuth / session errors, unless we implement some fix around it. They'll also not be able to visibly see that they're logged in on receiving an error (unless we do something about that too)
  • Inability for these users to be seen as the same user when visiting other wikis

3. Create account at beginning of internalAttemptSave and find way to perform top-level login on failure

  • Create account at beginning of internalAttemptSave, before any hooks/constraints
  • Perform a top-level login on failed edits

Advantages:

  • Same as 2, but we also ensure the user is attached to loginwiki, and that there is an improved UX in failure modes

Disadvantages:

  • Something on the order of 55k accounts created per month, of which some percentage won't be used at all
  • Wikitext editors lose their changes on failed attempts (due to top level redirect)
  • Modifying EditPage.php to support parsing error messages into query params for top-level login will be painful and clunky. Same goes for API driven editing.
  • Would need to modify wikitext editor UX and API driven editing (VisualEditor/DiscussionTools/Wikibase) UX to support the top-level redirect workflows on edit failures

It may be the case in the future that CentralAuth's SSO changes (T345249, T348388), in which case option 3 becomes more palatable, and would require less refactoring.

It seems like the least disruptive option here is option 1 (status quo; only create temp users on success, and hooks that fire before success should make use of the reserved temp user name, but not a full account), assuming that it would work for SpamBlacklist (T358806).

OK, option 1 would indeed not work for SpamBlacklist because it uses a regular ManualLogEntry which needs an actor.

One idea I've explored is to allow ActorStore to not require a user ID in validateActorForInsertion() when working with a temp account:

@@ -628,7 +628,8 @@ class ActorStore implements UserIdentityLookup, ActorNormalization {
 		}
 
 		$userId = $user->getId( $this->wikiId ) ?: null;
-		if ( $userId === null && $this->userNameUtils->isUsable( $user->getName() ) ) {
+		if ( $userId === null && $this->userNameUtils->isUsable( $user->getName() )
+			&& !$this->userNameUtils->isTemp( $user->getName() ) ) {
 			throw new CannotCreateActorException(
 				'Cannot create an actor for a usable name that is not an existing user: ' .
 				"user_name=\"{$user->getName()}\""

That allows temp account names to appear in Special:Log in a similar way to how they appear in AbuseFilter log -- they are visible, clickable names, but when you visit the username, you'll see a message that it doesn't exist. There's a further complication in that if the user goes to save a successful edit, CentralAuth / AuthManager don't know how to handle a row in the actor table that has an actor_id and a actor_name but actor_user is null (because the user account didn't exist when we created the actor table row). So we'd need to work out a safe way to update the actor table entry to have the newly created user ID as part of temp account creation.

OK, option 1 would indeed not work for SpamBlacklist because it uses a regular ManualLogEntry which needs an actor.

One idea I've explored is to allow ActorStore to not require a user ID in validateActorForInsertion() when working with a temp account:

@@ -628,7 +628,8 @@ class ActorStore implements UserIdentityLookup, ActorNormalization {
 		}
 
 		$userId = $user->getId( $this->wikiId ) ?: null;
-		if ( $userId === null && $this->userNameUtils->isUsable( $user->getName() ) ) {
+		if ( $userId === null && $this->userNameUtils->isUsable( $user->getName() )
+			&& !$this->userNameUtils->isTemp( $user->getName() ) ) {
 			throw new CannotCreateActorException(
 				'Cannot create an actor for a usable name that is not an existing user: ' .
 				"user_name=\"{$user->getName()}\""

That allows temp account names to appear in Special:Log in a similar way to how they appear in AbuseFilter log -- they are visible, clickable names, but when you visit the username, you'll see a message that it doesn't exist. There's a further complication in that if the user goes to save a successful edit, CentralAuth / AuthManager don't know how to handle a row in the actor table that has an actor_id and a actor_name but actor_user is null (because the user account didn't exist when we created the actor table row). So we'd need to work out a safe way to update the actor table entry to have the newly created user ID as part of temp account creation.

I think conceptually it's correct that the performer of these actions is an actor (an individual who did something), but not a user (no need to be logged in). This either means we have a new type of actor here (which is similar to an IP actor or a system actor in that it has only an actor row), or we need to re-define temp users as not necessarily having a user row. Since we don't currently have any user types that may or may not have a user row, the second idea might be a bit tricky.

The idea of being able to link the log to a temp account if a successful edit is made later is nice. If it's not possible, I think we can still use this idea.

Since this conversation is going a bit beyond how to solve for AbuseFilter, I've made a new task: T360870: How should we represent actors that do something loggable but don't create a temp account? and discussed this idea a bit there.

Change #1014002 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/core@master] [WIP] Temp accounts: Allow logging non-existent temp accounts

https://gerrit.wikimedia.org/r/1014002

OK, option 1 would indeed not work for SpamBlacklist because it uses a regular ManualLogEntry which needs an actor.

One idea I've explored is to allow ActorStore to not require a user ID in validateActorForInsertion() when working with a temp account:

@@ -628,7 +628,8 @@ class ActorStore implements UserIdentityLookup, ActorNormalization {
 		}
 
 		$userId = $user->getId( $this->wikiId ) ?: null;
-		if ( $userId === null && $this->userNameUtils->isUsable( $user->getName() ) ) {
+		if ( $userId === null && $this->userNameUtils->isUsable( $user->getName() )
+			&& !$this->userNameUtils->isTemp( $user->getName() ) ) {
 			throw new CannotCreateActorException(
 				'Cannot create an actor for a usable name that is not an existing user: ' .
 				"user_name=\"{$user->getName()}\""

That allows temp account names to appear in Special:Log in a similar way to how they appear in AbuseFilter log -- they are visible, clickable names, but when you visit the username, you'll see a message that it doesn't exist. There's a further complication in that if the user goes to save a successful edit, CentralAuth / AuthManager don't know how to handle a row in the actor table that has an actor_id and a actor_name but actor_user is null (because the user account didn't exist when we created the actor table row). So we'd need to work out a safe way to update the actor table entry to have the newly created user ID as part of temp account creation.

I think conceptually it's correct that the performer of these actions is an actor (an individual who did something), but not a user (no need to be logged in). This either means we have a new type of actor here (which is similar to an IP actor or a system actor in that it has only an actor row), or we need to re-define temp users as not necessarily having a user row. Since we don't currently have any user types that may or may not have a user row, the second idea might be a bit tricky.

The idea of being able to link the log to a temp account if a successful edit is made later is nice. If it's not possible, I think we can still use this idea.

Since this conversation is going a bit beyond how to solve for AbuseFilter, I've made a new task: T360870: How should we represent actors that do something loggable but don't create a temp account? and discussed this idea a bit there.

I uploaded https://gerrit.wikimedia.org/r/1014002 as a proof of concept. You can trigger it by creating a SpamBlacklist entry for e.g. x.com. Try making an edit with x.com; you'll see an error and won't have a temp account but will have a reserved temp account name in the session. You'll also be able to see an entry in Special:Log that references the non-existent temp user. Then, make a successful edit. You'll be logged-in, and the Special:Log entry will reference the actually existing temp user account.

NB We're continuing the conversation about generally solving the logging problem (including the above few comments) on T360870.

I was summoned: It's fine for now but some improvements should be put in place after roll out.