Page MenuHomePhabricator

Investigate missing loginwiki accounts
Closed, ResolvedPublic

Description

In theory, every user has a local account loginwiki because 1) right after signup they are redirected to loginwiki's Special:CentralLogin, which autocreates; 2) we create an account via CentralAuthCreateLocalAccountJob just in case. In practice, allegedly the process is not super reliable (this is expected for central login, since the browser might prevent cookie access or there can be a network error, but unexpected for the job).

This is a problem as (after T363695 disables central login on loginwiki) we want to rely on the job to make sure there is still one wiki which has all the accounts, a requirement from anti-abuse volunteers. We should investigate, get a sense of how reliable the job is, and see if we can find something to fix.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I've done some simple spot checks, looking at recent account creation on en wiki vs accounts on loginwiki, and it's good so far, but that's small amounts of data. Working on getting a more comprehensive check for say a 30 day period, we'll see what it shows.

Okay, I checked accounts created during the last approximately 30 days on enwiki. There were about 134700 users created; about 400 of those had no entry in the loginwiki user table, but they all had entries in the globaluser table (looking entries up on loginwiki and globaluser by name).

Of these missing entries, about 140 had global accounts registered in the globaluser table within the last 30 days, at varying dates and times, no particular pattern to them. The user names also don't seem to be particularly remarkable, some with only characters in the a-zA-Z range. Some have a few edits, some do not. Here's an example: https://en.wikipedia.org/wiki/Special:CentralAuth?target=Aomurchu

Other entries had global accounts registered well before 2024, with most in 2013 or 2015.

Next up: what goes wrong with the job? It's not super common, whatever the issue is, just common enough to be a problem :-/

Putting here for reference and don't even think of looking at it because it's gross: the script I used to get this info https://github.com/apergos/analyticsqueries I'm just linking it here for the record.

Thanks for checking, and for sharing the queries! One thing you could spot-check is creating local accounts for a small random sample of the affected users (there is a maintenance script in CentralAuth), to see if it is reliably blocked (e.g. by username policy although seems unlikely on loginwiki) or one-off errors.

Another thing would be statistics for CentralAuthCreateLocalAccountJob - does it fail? Does it not even run? (I think we have error and success counts in Prometheus - in theory the success count on loginwiki should equal the number of new users.) Also, do the missing accounts on loginwiki match the missing accounts on metawiki (the other wiki where these jobs run)?

At least some of the problematic names are accounts that were created on behalf of a user by another user, via https://en.wikipedia.org/wiki/Wikipedia:Request_an_account/Guide (see code at https://github.com/enwikipedia-acc/waca/tree/rel7.17 )

This in turn lands the creating user at https://en.wikipedia.org/wiki/Special:CreateAccount with various fields filled in and "Use a temporary random password" option checked, which is required for that usage of this tool.

I imagine account creation requests are a tiny fraction of all account creations so that would indicate those fail much more often?

Normally the loginwiki account is created by central login, and the job only fills in the cases where that doesn't work. But for account creation requests there is no central login, only the job. So I guess it makes sense for them to be more error-prone.

So, of the 140 accounts created on enwiki in the last 30 days, no entry on loginwiki, and having an entry in the globalusers table made in the last 30 days, about half of those were created on behalf of the user by an admin. I'll have a look at a good chunk of the rest to see if there's anything interesting about those other 70-ish accounts.

Update: Looks like most were created by a second admin, same tool.

Checking metawiki for the same accounts created on enwiki for the same uid range as earlier, there's about 620 that were globally registered in July that have no accounts on metawiki. So that's somewhat worse than loginwiki, though still a small number.

Here's a sample case, weird that there's such a long delay between the enwiki account being registered and the login wiki one: https://en.wikipedia.org/wiki/Special:CentralAuth?target=Blivolsi

Well, not so weird in the end, another account created by an admin on behalf of the user.

I can find insertion metrics for local account creation in prometheus (though apparently not per wiki, so that's a problem), but I can't seem to find a topic for retries or failures. Sample link for that, so we have it: https://prometheus-eqiad.wikimedia.org/ops/graph?g0.expr=kafka_server_BrokerTopicMetrics_MessagesIn_total%7Btopic%3D%22eqiad.mediawiki.job.CentralAuthCreateLocalAccountJob%22%7D&g0.tab=1&g0.stacked=0&g0.show_exemplars=0&g0.range_input=1h&g0.end_input=2024-07-29%2012%3A00%3A20&g0.moment_input=2024-07-29%2012%3A00%3A20

I can ask godog to see if there's a better way to dig into these, or maybe someone on the data engineering team would know about the kafka metrics. But I'm not sure if it's worth it, given that we're likely to just run a daily backfill job.

After discussion during the weekly team meeting, we decided that further work on this task is not warranted, so closing. Next steps will be in a separate task for doing the backfill.