Page MenuHomePhabricator

Create a script to backfill missing local accounts on loginwiki/metawiki for new global accounts
Closed, ResolvedPublic

Description

After T368230: Investigate missing loginwiki accounts, it's clear that some local accounts do not get created, although the number is small. We can fire off a script once a week that checks for missing local accounts on the two wikis that should have them (loginwiki and metawiki). There is no appropriate maintenance script currently to do this, so write one.

I need to double check what the situation is with autocreation of temporary accounts, which are currently live on testwiki but are expected to roll out soon elsewhere.

Event Timeline

I need to double check what the situation is with autocreation of temporary accounts, which are currently live on testwiki but are expected to roll out soon elsewhere.

As a Engineer on Trust and Safety Product Team, we have enabled it on loginwiki and testwiki. Wikis where temporary account are not enabled have temporary account usernames as unusable (UserNameUtils::isUsable) which prevents autocreation.

We are not planning on enabling on metawiki in any short timescale, so for the time being we should expect that the maintenance script will fail to autocreate temporary accounts on metawiki.

We can fire off a script once a week that checks for missing local accounts on the two wikis that should have them (loginwiki and metawiki)

Just as an FYI, we might need to work what CheckUser displays for these accounts. Currently stewards can run a CheckUser on loginwiki to get the IP address and user agent used in the autocreation of the account.

However, this maintenance script would use an internal IP and as such it would appear as if the account was created by an internal IP. As such, some mitigations or at least communication with the stewards is needed to avoid confusion.

As to other mitigations we could:

  • Fetch the correct IP, user agent, and timestamp by finding it on the wiki where the user was actually created. Then it would appear in CheckUser as if the account was autocreated normally.
  • Add some kind of flag to indicate that this was created by a backfill script (such as a custom user agent string that says Backfill creation by maintenance script)

I think temp users can just be handled like any other account. Wrt the IP, how do we decide whether faking the user's last IP address is important enough?

The context for the task is that SUL3 will change from the current workflow (user creates account on local wiki -> autocreates on loginwiki via web browser redirect -> if that failed, autocreates on loginwiki via job) to user creates account on local wiki -> autocreates on loginwiki via job, which will presumably mean more failures, so we want to have something more reliable than the current job.

Wrt the IP, how do we decide whether faking the user's last IP address is important enough?

I would suggest asking the stewards if you want it from them directly, but loginwiki is used frequently for stewards to find abuse of multiple accounts and having the correct IP is important (otherwise you couldn't find out if two accounts were created using the same IP). If you have staff rights on production, you can see the log of checks at https://login.wikimedia.org/wiki/Special:CheckUserLog which shows it's being used frequently and that checks are being made to find the accounts which were created on a given IP address. If you don't have the rights, I can share some stats about this checks list privately.

Another example of this is that for the http-client-hints feature, the stewards asked us to collect this data for account creations on loginwiki (filed as T347393).

Hey @Dreamy_Jazz can you point me to some sort of schedule as to the rollout? I'm not interested in hard dates, just getting a general sense of the order and time frame.

@ArielGlenn Hi there! Hope you're doing well. We are still a bit far from deploying to any major production wikis (1-2 quarters possibly). It's hard to put any firm dates around this because of multiple dependencies on other teams and on Legal. We haven't decided on which wikis we will pilot on either.

@ArielGlenn Hi there! Hope you're doing well. We are still a bit far from deploying to any major production wikis (1-2 quarters possibly). It's hard to put any firm dates around this because of multiple dependencies on other teams and on Legal. We haven't decided on which wikis we will pilot on either.

Fair enough! I'll bear that in mind as we move forward on this script. Thanks! (Hope you're doing well too.)

Fetch the correct IP, user agent, and timestamp by finding it on the wiki where the user was actually created. Then it would appear in CheckUser as if the account was autocreated normally.

Note this can only be done if user is created in recent 90 days. A different solution should be used for older accounts.

I would suggest asking the stewards if you want it from them directly

If you think it's necessary, we'll definitely take your word for it! Your previous comment was just very noncommittal :)

CentralAuth does not know about IP addresses and CheckUser does not know about autocreations, so connecting the two will be a bit awkward. I think there should be a CheckUser service for fetching the relevant information about log events, with stability guarantees, and CentralAuth should use that to load the IP and user agent from the account creation log on the user's home wiki and apply it via importScopedSession(). That would minimize the amount the two extensions have to know about each other. (Or almost minimize. It could be further reduced with a new hook, but IMO that's overkill - CentralAuth is fairly Wikimedia-specific already, and checks for the presence of other extensions in a number of places, adding one more should not be a big deal.)

A different solution should be used for older accounts.

Older accounts are not in scope for this task. In any case, I very much doubt we still retain IP information for them.

...

As to other mitigations we could:

  • Fetch the correct IP, user agent, and timestamp by finding it on the wiki where the user was actually created. Then it would appear in CheckUser as if the account was autocreated normally.
  • Add some kind of flag to indicate that this was created by a backfill script (such as a custom user agent string that says Backfill creation by maintenance script)

Would using a custom system user for these autocreations be a reasonable workaround, or would that not help matters any?

Change #1059857 had a related patch set uploaded (by ArielGlenn; author: ArielGlenn):

[mediawiki/extensions/CentralAuth@master] Script to backfill local account creation for global users

https://gerrit.wikimedia.org/r/1059857

The patch currently up is missing the CheckUser related stuff, it's just bare bones.

Change #1067385 had a related patch set uploaded (by ArielGlenn; author: ArielGlenn):

[mediawiki/extensions/CheckUser@master] Add a service that will retrieve ip and user agent for account creation

https://gerrit.wikimedia.org/r/1067385

This comment was removed by Dreamy_Jazz.

Change #1073764 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Expand AccountCreationDetailsLookupTest

https://gerrit.wikimedia.org/r/1073764

Change #1073764 abandoned by Dreamy Jazz:

[mediawiki/extensions/CheckUser@master] Expand AccountCreationDetailsLookupTest

https://gerrit.wikimedia.org/r/1073764

Change #1073764 restored by Dreamy Jazz:

[mediawiki/extensions/CheckUser@master] Expand AccountCreationDetailsLookupTest

https://gerrit.wikimedia.org/r/1073764

I've provided review for the CheckUser patches, and I think the MediaWiki-extensions-CentralAuth patch can be handled by the MediaWiki-Platform-Team. Therefore, moving this to Trust and Safety Product Team's "Done" column.

Change #1067385 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Add a service that will retrieve ip and user agent for account creation

https://gerrit.wikimedia.org/r/1067385

Change #1073764 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Expand AccountCreationDetailsLookupTest

https://gerrit.wikimedia.org/r/1073764

Hey @Dreamy_Jazz

I have a question about the change made here https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/1073764

I understand that we need to allow for the log type autocreate-account in the cu_private_event table, in order to scoop up creations of temporary accounts. But aren't autocreations done any time that a user with a temporary account and a valid session accesses another wiki where the user's temporary account does not yet exist? So, might we not have a number of these rows instead of just one, when all we want is the initial temporary account creation at the time of the first edit or preview or whatever triggers the generation of the temp account name and so on? Or are these later automatic temp account creations not logged to cu_private_event?

@ArielGlenn

Temporary accounts do not have the same log entry on their first creation. Instead, when a user makes an edit to create a temporary account it is logged as an autocreation.

This means that all the wikis the temporary account exists on, including the first wiki they edited on, will have an autocreate log entry.

I presume that CentralAuth can find the wiki that was used first and then get a database replica connection to it. If it can do this, then my patch will be fine as the log entry for an autocreation should only happen once per wiki (so for a named account this uses the create entry and a temporary account uses the autocreate entry). Therefore there should be no risk about seeing multiple entries.

See https://gerrit.wikimedia.org/g/mediawiki/core/+/bc72d5b4f78cdd9f82e2bab61d3d65e28834cd97/includes/user/TempUser/TempUserCreator.php#124 for the code that creates the temporary account.

Change #1075274 had a related patch set uploaded (by Reedy; author: Reedy):

[integration/config@master] zuul/parameter_functions.py: Load CheckUser for CentralAuth CI and phan jobs

https://gerrit.wikimedia.org/r/1075274

Change #1075274 merged by jenkins-bot:

[integration/config@master] zuul/parameter_functions.py: Load CheckUser for CentralAuth CI and phan jobs

https://gerrit.wikimedia.org/r/1075274

Change #1059857 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] Script to backfill local account creation for global users

https://gerrit.wikimedia.org/r/1059857

Created and running so this task can definitely be closed.