Page MenuHomePhabricator

Accounts taking 30+ minutes to autocreate on metawiki/loginwiki (2023-05)
Open, Needs TriagePublicBUG REPORT

Description

It is taking over 30 minutes for most new account creations to autocreate accounts on metawiki and loginwiki. Some will create the account on loginwiki or metawiki as expected, but the creation of the other will be delayed. Most seem to not have either. Some create fine.

Examples:

Previous occurrance: T314442

Event Timeline

As usual, this resolved itself without intervention after a few days. However, every time it happens it prevents us from appropriately responding to abuse. Given that there is no reason to think it will not reoccur, I am leaving this task open.

taavi subscribed.

Not sure if there's anything we can do about the job queue being slow, but let's at leask ask that from the correct (I think) people.

Change 935078 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/alerts@master] team-sre: Add warning for CentralAuth job lag

https://gerrit.wikimedia.org/r/935078

These issues seem to have occurred before we added capacity to the jobrunner cluster. I'm in the process of adding an alert to quantify if it still happens often, and if it does we'll try to figure out a fix.

Possible things to think about in the meantime are:

  • It is low volume, but relatively time-sensitive, does it make sense to give it its own lane in cp-jobqueue?
  • If it is very time sensitive, maybe it should be something else than a job (we can potentially have to shut off the jobqueue at times). If so, and since it is cross-wiki, what could be the mechanism?

This is currently happening! I'm seeing some users (also LTAs) created like 30 minutes ago, and I cannot check on loginwiki/metawiki because they have only one local user in the SUL (their homewiki). While I can see other, more recent users, with the SUL created without problems! Just for example this one (don't want to add others, since there are some who are abusive usernames), thanks.
P.S. FTR Some hours ago I also noticed this spambot (without autocreated user on loginwiki after almost 2 days)!

The p99 job backlog for CentralAuthCreateLocalAccountJob is 30s (with a few spikes going up to 4 min). That doesn't seem so bad.

  • It is low volume, but relatively time-sensitive, does it make sense to give it its own lane in cp-jobqueue?

What counts as low volume? The insertion rate is 0.2/s on average.

Change 935078 abandoned by Clément Goubert:

[operations/alerts@master] team-sre: Add warning for CentralAuth job lag

Reason:

Abandoned for gitlab migration

https://gerrit.wikimedia.org/r/935078