It appears that we are seeing several users, including but not limited to system users, which have no central account at the point of writing to the cuci_user index.
Error
Unable to find central ID for user X when recording action in cuci_user table
It appears that we are seeing several users, including but not limited to system users, which have no central account at the point of writing to the cuci_user index.
Unable to find central ID for user X when recording action in cuci_user table
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| CentralAuthIdLookup: Respect DB flags in centralIdFromLocalUser() | mediawiki/extensions/CentralAuth | master | +39 -5 |
The user who performed the action does not appear to have a global account (verified by going to Special:CentralAuth). I don't know how they were able to perform an action without one, unless there was some sort of delay.
I can see how this would happen when running the maintenance script to populate the central index tables (time has passed since the action was performed and in that time the user might have been deleted). But I don't see how it happens when the central index tables are written by the hook (unless there is a significant delay in the running of the job).
It appears that this is the AbuseFilter user for the majority of events, but there are some temporary accounts and normal accounts included in the mix.
The logs seem to be coming from Special:Import, many of them from a single request. AIUI this may be because on SUL wikis, ExternalUserNames + CA will check whether unknown users in imported revisions exist centrally and autocreate but not attach the local account.
Actually CA should already do the attaching so it may be sufficient to read from the primary here.
I have spent most of today investigating it, but with no conclusive results. Here are my findings.
Most of these errors appear related to user autocreation within the same request, e.g. 42d054b0-c480-4105-9a6f-fed631af8d1a. Often, but not in all cases, this is due to imports on SUL wikis triggering autocreation for the authors of the imported revisions.
My initial hypothesis was that CentralAuth may not correctly read from the primary in some cases. In an attempt to reproduce this locally on a SUL wiki farm, I used https://phabricator.wikimedia.org/P69465 to create test users and have them edit a test page without having the users exist on all local wikis. I then imported the resulting page on a different wiki—the users were autocreated as expected but the error didn't occur. Simulating three seconds of replication lag also did not trigger the problem.
In an example I'm looking at, it seems that the error was triggered by a user navigating to another wiki. I'm not sure if the appearance of User::loadFromSession called before the end of Setup.php 139 times in the request ID logs is suspicious. In the logs for that request, I see:
| 04:00:35.972 | Attaching local user {username}@simplewiki by 'login' |
| 04:00:36.493 | Autocreation attempt |
| 04:00:39.052 | Unable to find central ID for local user {username} when recording action in cuci_user table. |
Looking at our code:
$centralId = $this->centralIdLookup->centralIdFromLocalUser( $performer, CentralIdLookup::AUDIENCE_RAW ); if ( !$centralId ) { $centralId = $this->centralIdLookup->centralIdFromLocalUser( $performer, CentralIdLookup::AUDIENCE_RAW, IDBAccessObject::READ_LATEST ); }
The centralIdFromLocalUser method does:
return $this->isAttached( $user ) ? CentralAuthUser::getInstance( $user )->getId() : 0;
In isAttached(), we are returning return $centralUser->exists() && $centralUser->attachedOn( $wikiId );. We eventually get to CentralAuthUser::loadState(). As far as I can tell, that is reading from the replica. Is it possible we are encountering this error when replica lag of several seconds results in the user not being seen as attached? The code to write the attached status of the account happens at 0:400:35.972 in the example above, while the error we see happens at 0:400:39.052.
That was my assumption too, but if the user was autocreated in the same request, CentralAuthUser::getInstance() may return an already-cached instance loaded from the primary assuming it hasn't been LRU evicted. It might get loaded from the replica, if it did get evicted, but I couldn't get a reproducer at any rate. Still, probably wouldn't hurt to consistently read from the primary in this case.
Change #1078372 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):
[mediawiki/extensions/CentralAuth@master] CentralAuthIdLookup: Respect DB flags in centralIdFromLocalUser()
Change #1078372 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] CentralAuthIdLookup: Respect DB flags in centralIdFromLocalUser()
We've observed a marked drop in the rate of these errors after the above patch went out with the train. Only labswiki seems to be generating these logs now, which should hopefully be resolved by the impending SUL migration.
I still see this occurring on other wikis. By chance, one of the instances recorded is for my personal wiki account https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2024.10.29?id=QlOC15IB60NnMROHGgvz. In this case, I visited a wiki I had never been to before (igwiki) via an account I was logged-in to from elsewhere (enwiki), then I clicked "log out". For reasons unclear to me, my account isn't attached to igwiki, and so I've triggered this error.
The lots relating to labswiki are expected and should resolve when the SUL migration. I've seen no such errors that are not on labswiki since the 26th of November, with only one error since the 12th of November. The logs stopping roughly equates to when group0 was moved to 1.44.0-wmf.3.
Given that the labswiki issue should be resolved through the SUL migration completing and we have seen the other errors stopping for several weeks, I think we can call this resolved.
Recent logs appear to be from when imports have occurred and are being addressed in T385734: Account auto-creations caused by import of page caused actor table rows and CheckUser result rows but no associated user table row. We can make new tasks for more occurrences of this error.