Page MenuHomePhabricator

RevertRisk model readiness for temporary accounts
Open, Needs TriagePublic

Description

As part of the overall effort for preparing tooling for Temporary accounts rollout, it would be nice to have the language agnostic revert risk model and the multilingual revert risk models take into account whether the user account is a temporary account, or a full account.

We have two ways of doing this:

  • the user_is_temp flag in the user table
  • checking if the username matches a reserved pattern for temporary account usernames

For the multilingual revert risk model, I'd imagine we'd want to update code that checks if the user is an IP editing user, and include an OR conditional that also includes temporary accounts.

I am unsure of what if anything would need to be done to the language agnostic revert risk model.

The user_is_temp flag is available today, even if it is not actually populated for any accounts in production yet.

Ideally, the revert risk models could support temporary accounts in time for our rollout to testwiki and pilot wikis in March/April 2024.

Event Timeline

I guess this may be more of a question for Research team, cc @diego

Hi @kostajh , I'm not sure if I'm understanding the question. Are you proposing to add the "user status" (temporary/full) as feature on Revert Risk?

Are you proposing to add the "user status" (temporary/full) as feature on Revert Risk?

I think so. What I am trying to say is:

  • Revert Risk currently knows about "anonymous users / IP editors" vs full users and weighs that when generating a score
  • As part of Temporary accounts there will no longer be "anonymous users", there will be temporary accounts. We suspect these will behave similarly to IP users in some ways and to full accounts in other ways.
    • For Revert Risk as the endpoint exists currently, it would only see that the associated account is not anonymous. So I think Revert Risk needs to be updated to handle temporary accounts.

Ok! I understand.
Currently, Revert Risk uses several user's features. I think the "revision count" could be used as a replacement of the "anonymous" field. However, probably the most straight forward solution would be to replace the "anonymous" column for a "temporary" column.

@kostajh , my question to you is: would the "user_is_temp" field be returned via the MediaWiki Api ? Currently, we obtain all the user information from there (@MunizaA please correct if I'm wrong).
@JAllemandou / @Milimetric : are you planning to add this column (user_is_temp) on the MediaWiki_history table? (this is relevant for retraining)

@diego We return temp from the APIs that return anon - see the list in T351636: Add `temp` flag to various APIs. Are there any more you'd need updating?

@diego We return temp from the APIs that return anon - see the list in T351636: Add `temp` flag to various APIs. Are there any more you'd need updating?

@diego see e.g. https://de.wikipedia.beta.wmflabs.org/w/api.php?action=query&format=json&list=users&formatversion=2&usprop=groups&ususers=*Unregistered%2068998 which returns:

{
    "batchcomplete": true,
    "query": {
        "users": [
            {
                "userid": 18006,
                "name": "*Unregistered 68998",
                "groups": [
                    "*",
                    "temp"
                ]
            }
        ]
    }
}
XiaoXiao-WMF triaged this task as Medium priority.
XiaoXiao-WMF raised the priority of this task from Medium to Needs Triage.Jan 17 2024, 8:02 PM
XiaoXiao-WMF subscribed.

@kostajh can you please provide timelines for when temporary accounts is planning to be implemented. Also, please comment on when, ideally, would you like this work to be done from Research side.

@kostajh can you please provide timelines for when temporary accounts is planning to be implemented.

We are aiming for testwiki deployment by March/April and pilot wikis by May.

Also, please comment on when, ideally, would you like this work to be done from Research side.

Ideally, by the time we are deploying to pilot wikis, the model will understand that revisions made by temp accounts should be scored differently than if those revisions came from full accounts. I am not sure how much you'll be able to do, though, without a lot of real world data of temp account edits?

Ideally, by the time we are deploying to pilot wikis, the model will understand that revisions made by temp accounts should be scored differently than if those revisions came from full accounts. I am not sure how much you'll be able to do, though, without a lot of real world data of temp account edits?

I think we should consider them as equivalent for anonymous , so basically means change the code to "is_anonymous" -> "has (group:'temp'), does this makes sense @kostajh

Ideally, by the time we are deploying to pilot wikis, the model will understand that revisions made by temp accounts should be scored differently than if those revisions came from full accounts. I am not sure how much you'll be able to do, though, without a lot of real world data of temp account edits?

I think we should consider them as equivalent for anonymous , so basically means change the code to "is_anonymous" -> "has (group:'temp'), does this makes sense @kostajh

I think that makes sense as a starting point, and maybe after a few months of real world data, it would be easier to model them differently if needed. (cc @Tchanders)

Ideally, by the time we are deploying to pilot wikis, the model will understand that revisions made by temp accounts should be scored differently than if those revisions came from full accounts. I am not sure how much you'll be able to do, though, without a lot of real world data of temp account edits?

I think we should consider them as equivalent for anonymous

@diego how do we want to deal with the difference in other features between temporary and anonymous users. For example, these are the expected user features for an anonymous user:

user_age=0,
user_is_anonymous=1,
user_is_bot=0,
user_revision_count=0,
user_groups=[],

However for temporary users, user_age, user_revision_count and the number of user_groups would all be non-zero while user_is_anonymous would still be 1 which I'm not sure is a combination of features that this model has seen before.

@MunizaA , until we don't have enough training data we should treat temporary accounts as anonymous users. In practice this means to overwrite temporary users features.
So, basically

if 'temp' in user_groups:
  user_age=0,
  user_is_anonymous=1,
  user_is_bot=0,
  user_revision_count=0,
  user_groups=[],

@MunizaA , until we don't have enough training data we should treat temporary accounts as anonymous users. In practice this means to overwrite temporary users features.
So, basically

if 'temp' in user_groups:
  user_age=0,
  user_is_anonymous=1,
  user_is_bot=0,
  user_revision_count=0,
  user_groups=[],

That looks right, AFAICT. Per T330816: [Epic] Temporary users should not be assigned to user groups, temp users should not be assigned to user groups, nor can they set the user_is_bot flag as they don't have access to set user properties.

How do you use user_is_anonymous in the model?

@MunizaA , until we don't have enough training data we should treat temporary accounts as anonymous users. In practice this means to overwrite temporary users features.
So, basically

if 'temp' in user_groups:
  user_age=0,
  user_is_anonymous=1,
  user_is_bot=0,
  user_revision_count=0,
  user_groups=[],

That looks right, AFAICT. Per T330816: [Epic] Temporary users should not be assigned to user groups, temp users should not be assigned to user groups, nor can they set the user_is_bot flag as they don't have access to set user properties.

That's correct. The only thing that might be helpful is user_age.

In case it helps, here's some additional context:

  • Temp users are very similar to IP (anon) users: https://www.mediawiki.org/wiki/User_account_types, but there is potential for them to change in the future. They are implemented very similarly to registered accounts, but with most of the features switched off, but in theory they could be switched on again (e.g. groups, preferences, etc). We have no plans to do any of that in the forseeable future, just technically it could happen via some future project. For that reason, in some other places (but not everywhere) they are being treated as a separate type of user from either IP or registered users (more discussion is on T337103: Decide a standard approach for classifying temporary, IP and registered users).
  • We're not rolling out to all wikis in one go, so for some period of time (months at least) there will be IP edits from some wikis but temporary account edits from others.