Page MenuHomePhabricator

Write to cuci_user table when CheckUser actions occur
Closed, ResolvedPublic3 Estimated Story Points

Description

Technical background

In T368151 we added a central table, cuci_user, to keep track of which wikis have edited by a given account.

The table records the central ID of the user, wiki and timestamp of the latest action (code).

CheckUser already updates the relevant table whenever an action is saved, via RecentChangeSaveHandler::onRecentChange_save or other handlers (for private events). This is done via service methods in CheckUserInsert.

What needs doing

The service and hook handler can be adapted to update cuci_user. Note that we should follow the relevant advice listed in T368151#9993273. Notably:

  • Do not try to update the timestamp if it was within at least the last minute (to reduce calls)
    • This could also include only performing the update 1/10 times if the timestamp is within the last hour
  • Do not store data for bot accounts
  • Ensure that the writes occur via a job
  • Exclude actions on WMCS IP addresses
  • Just update the timestamp if a row for the same central ID/wiki already exists in the table
  • Refer to the cuci_wiki_map table for the ID of the wiki

Related Objects

StatusSubtypeAssignedTask
Resolvedkostajh
DeclinedNone
In ProgressNiharika
OpenNone
OpenDreamy_Jazz
ResolvedDreamy_Jazz
OpenNiharika
OpenSTran
OpenNone
OpenNone
OpenNone
ResolvedSTran
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz

Event Timeline

Dreamy_Jazz updated the task description. (Show Details)
Dreamy_Jazz changed the point value for this task from 3 to 2.
Dreamy_Jazz changed the point value for this task from 2 to 3.Aug 27 2024, 8:09 PM

Change #1067423 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] [WIP] Add CheckUserCentralIndexManager::getWikiMapIdForDomainId

https://gerrit.wikimedia.org/r/1067423

Change #1067424 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] [WIP] Allow CheckUserCentralIndexManager to write to cuci_user

https://gerrit.wikimedia.org/r/1067424

Change #1067425 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] [WIP] Implement ways to limit update rate of cuci_user

https://gerrit.wikimedia.org/r/1067425

Change #1067426 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] [WIP] Call CheckUserCentralIndexManager in CheckUserInsert

https://gerrit.wikimedia.org/r/1067426

Change #1067423 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Add CheckUserCentralIndexManager::getWikiMapIdForDomainId

https://gerrit.wikimedia.org/r/1067423

Change #1067424 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Allow CheckUserCentralIndexManager to write to cuci_user

https://gerrit.wikimedia.org/r/1067424

Change #1067425 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Implement ways to limit update rate of cuci_user

https://gerrit.wikimedia.org/r/1067425

Change #1067426 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Call CheckUserCentralIndexManager in CheckUserInsert

https://gerrit.wikimedia.org/r/1067426

Change #1071251 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[operations/mediawiki-config@master] Define wgCheckUserCentralIndexRangesToExclude to exclude WMCS

https://gerrit.wikimedia.org/r/1071251

Change #1071251 merged by jenkins-bot:

[operations/mediawiki-config@master] Define wgCheckUserCentralIndexRangesToExclude to exclude WMCS

https://gerrit.wikimedia.org/r/1071251

Mentioned in SAL (#wikimedia-operations) [2024-09-09T12:59:08Z] <dreamyjazz@deploy1003> Started scap sync-world: Backport for [[gerrit:1071251|Define wgCheckUserCentralIndexRangesToExclude to exclude WMCS (T373021)]]

Mentioned in SAL (#wikimedia-operations) [2024-09-09T13:08:43Z] <dreamyjazz@deploy1003> Started scap sync-world: Backport for [[gerrit:1071251|Define wgCheckUserCentralIndexRangesToExclude to exclude WMCS (T373021)]]

Mentioned in SAL (#wikimedia-operations) [2024-09-09T13:13:35Z] <dreamyjazz@deploy1003> dreamyjazz: Backport for [[gerrit:1071251|Define wgCheckUserCentralIndexRangesToExclude to exclude WMCS (T373021)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-09-09T13:36:54Z] <jforrester@deploy1003> Started scap sync-world: Backport for [[gerrit:1071251|Define wgCheckUserCentralIndexRangesToExclude to exclude WMCS (T373021)]], [[gerrit:rEWBA1071253a7fd3|tests: Disable all Beta Cluster CI testing, all failing (T374242)]], [[gerrit:1071254|Don't pass empty type/returnType to zobject lookup when undefined (T374199)]], [[gerrit:1071265|Use default width/height on gallery to avoid parser instance (T374146

Mentioned in SAL (#wikimedia-operations) [2024-09-09T13:40:26Z] <jforrester@deploy1003> dreamyjazz, jforrester: Backport for [[gerrit:1071251|Define wgCheckUserCentralIndexRangesToExclude to exclude WMCS (T373021)]], [[gerrit:rEWBA1071253a7fd3|tests: Disable all Beta Cluster CI testing, all failing (T374242)]], [[gerrit:1071254|Don't pass empty type/returnType to zobject lookup when undefined (T374199)]], [[gerrit:1071265|Use default width/height on gallery to avoid parser instance (T374146)

Mentioned in SAL (#wikimedia-operations) [2024-09-09T13:46:31Z] <jforrester@deploy1003> Started scap sync-world: Backport for [[gerrit:1071251|Define wgCheckUserCentralIndexRangesToExclude to exclude WMCS (T373021)]], [[gerrit:rEWBA1071253a7fd3|tests: Disable all Beta Cluster CI testing, all failing (T374242)]], [[gerrit:1071254|Don't pass empty type/returnType to zobject lookup when undefined (T374199)]], [[gerrit:1071265|Use default width/height on gallery to avoid parser instance (T374146

Mentioned in SAL (#wikimedia-operations) [2024-09-09T13:50:31Z] <jforrester@deploy1003> dreamyjazz, jforrester: Backport for [[gerrit:1071251|Define wgCheckUserCentralIndexRangesToExclude to exclude WMCS (T373021)]], [[gerrit:rEWBA1071253a7fd3|tests: Disable all Beta Cluster CI testing, all failing (T374242)]], [[gerrit:1071254|Don't pass empty type/returnType to zobject lookup when undefined (T374199)]], [[gerrit:1071265|Use default width/height on gallery to avoid parser instance (T374146)

Mentioned in SAL (#wikimedia-operations) [2024-09-09T13:58:34Z] <jforrester@deploy1003> Finished scap sync-world: Backport for [[gerrit:1071251|Define wgCheckUserCentralIndexRangesToExclude to exclude WMCS (T373021)]], [[gerrit:rEWBA1071253a7fd3|tests: Disable all Beta Cluster CI testing, all failing (T374242)]], [[gerrit:1071254|Don't pass empty type/returnType to zobject lookup when undefined (T374199)]], [[gerrit:1071265|Use default width/height on gallery to avoid parser instance (T37414

dom_walden subscribed.

The cuci_user table gets written to pretty much anytime the cu_changes, cu_log_event and cu_private_event tables are written to. It uses the same actor as was recorded in the cu_* tables (e.g. while logged in as a user if I create another account a row will be inserted/updated in cuci_user for the user I am logged in as, not the newly created user).

As well as the exceptions above (bot accounts, IPs in $wgCheckUserCentralIndexRangesToExclude), any action which does not have an actor associated with it. For example, a failed login attempt won't be written to cuci_user. This probably makes sense as the person trying to login might not be the owner of the username.

The job is enqueued from within a DeferredUpdate. I didn't check if this is a supported pattern, but I didn't observe any issues with it.

The insert to the cuci_user table is in a separate transaction from the inserts to the other cu_* tables. I deliberately made the inserts to cuci_user fail but this did not appear to effect the inserts to the cu_* tables.

I created some maintenance scripts (P69320) which inserted a large number of RecentChange entries. I was on a few occasions able to find timestamps for a user in cuci_user which did not correspond to the most recent RecentChange entry or users who were not recorded in cuci_user at all. I don't know if this is a genuine bug or just due to local environments not always being able to handle a very large number of requests. Reading the logs, I couldn't see anything obviously going wrong.

Thanks for the thorough QA. I think we can consider this done. If production has the same issues about missing rows (where there should actually be rows), we will probably find this through bug reports for the CheckUser-GlobalContributions and global autoblocks work.