Page MenuHomePhabricator

When a user is renamed CU logs in which they were the target user should be updated
Open, MediumPublic

Description

The cu_logs table is denormalized; it includes not just the user ID for the performer and the target of a check, but also their username. We have a hook for user renames (CheckUserHooks::onRenameUserSQL) since 2016. However, it only updates the cul_user_text column which represents the user who performed the check. It should also update the cul_target_text column. Currently, when you search CU logs using a user's new name, you find the correct logs but those logs show the user's old name.

Event Timeline

Here is the blurb of code that handles the user renames for the performer of the check. I have also pasted it right below. The problem is that RenameuserSQL only allows one set of operations per table, so we can't just simply add $renameUserSQL->tables['cu_log'] = [ 'cul_target_text', 'cul_target_id' ]; to the end of this function.

		/**
		 * For integration with the Renameuser extension.
		 *
		 * @param RenameuserSQL $renameUserSQL
		 * @return bool
		 */
		public static function onRenameUserSQL( RenameuserSQL $renameUserSQL ) {
			$renameUserSQL->tablesJob['cu_changes'] = [
				RenameuserSQL::NAME_COL => 'cuc_user_text',
				RenameuserSQL::UID_COL  => 'cuc_user',
				RenameuserSQL::TIME_COL => 'cuc_timestamp',
				'uniqueKey'    => 'cuc_id'
			];
	
			$renameUserSQL->tables['cu_log'] = [ 'cul_user_text', 'cul_user' ];
	
			return true;
		}

There are two ways we can handle this and both of them are a bit hackey.

Option One
Define the job for renaming the target users a a slow job, i.e. use tablesJob instead of tables:

$renameUserSQL->tablesJob['cu_log'] = [ 'cul_target_text', 'cul_target_id' ];

Since cu_logs is generally a small table, it doesn't need to be associated with a slow job. But there is no harm in doing so either.

Option Two (invalid, see next comment)
Define a new hook in CheckUser that specifically deals with the renaming of target users; this is less hackey, but I have never seen any other extension use the same hook twice, and I am not sure if there are risks associated with that.

CheckUserHooks.php
		/**
		 * For integration with the Renameuser extension.
		 * This hook is focused on renaming of the target users in the cu_log table
		 *
		 * @param RenameuserSQL $renameUserSQL
		 * @return bool
		 */
		public static function onRenameTargetUserSQL( RenameuserSQL $renameUserSQL ) {
			$renameUserSQL->tables['cu_log'] = [ 'cul_target_text', 'cul_target_id' ];
	
			return true;
		}
extension.json
...
"Hooks": {
...
	    "RenameUserSQL": [
	        "CheckUserHooks::onRenameUserSQL",
	        "CheckUserHooks::onRenameTargetUserSQL",
	     ], // is an array the right format?
...
}
...

Since @Reedy kindly helped me with a related issue earlier today, I am asking him to take a look here too and provide some advice.

Assigning to $renameUserSQL->tables['cu_log'] in two separate hooks doesn't change the fact that they're modifying the same variable, whichever hook runs last corresponds to the entry in $renameUserSQL->tables that will actually be run.

Ah. So we either have to fix the Technical-Debt in RenameUser (i.e. allow it to accept more than one set of instructions per table) which would be a heavy lift, or we have to use hackey option one above. I will give it another day or so for more feedback; otherwise, I will submit a patch for option one.

I will also create a separate task about the technical debt in RenameUser.

I can't help feel/wonder if we have other extensions that needed this...

Do we have a task about doing Actor migrations for CheckUser... As that could solve this in a different way (and make "renaming users" in these tables useless

Do we have a task about doing Actor migrations for CheckUser... As that could solve this in a different way (and make "renaming users" in these tables useless

We have T233004, but in that task, only the performer is being dealt with.

Do we have a task about doing Actor migrations for CheckUser... As that could solve this in a different way (and make "renaming users" in these tables useless

We have T233004, but in that task, only the performer is being dealt with.

If we care about renaming the target in the logs... We really should be migrating them to actor too?

If we care about renaming the target in the logs... We really should be migrating them to actor too?

I don't think so. A good case to compare it with is the logging table. Only the person who takes the action (e.g. 'delete' or 'block') should be mapped to actor. The person who is the subject of the block, should not be and is not. Another comparison is the ipblocks table in which, again, the person who created the block is mapped to the actor table but the target of the block is not.

Similarly, here the person who performed the check should be mapped to an actor, but the target should not be.

I don't quite get that logic. The point of actor is to not have to change usernames everywhere... It seems odd to do it some places, but not others...

Especially if we're still updating these rows when a rename happens, making it basically no better than it was before

Anyway, if we get T233004: Update CheckUser for actor and comment table fixed, we don't technically need T251312: RenameUser should allow for more than one operation per table, as we can just update the other columns instead

You are technically correct. If we get T233004 fixed, we can simply swap the column names in the RenameUser hook. Obviously, we also need a batch script to go through all existing data and update all stale values.

Option Three
Drop cul_target_text altogether and pull the user name through a join with user and actor tables.

The cul_target_text column is not part of any index, and when we search the logs by target user, that search is actually done using the user ID (i.e. the cu_target_id column). That is really the only SELECT query against cu_log and it either uses no index or uses the cul_type_target index. The cu_log table should be generally small and queries against it tend to be quite fast, so I don't even see much benefit in denormalizing the cul_target_text to begin with.

Finally had a chance to review the code more closely. We already join cu_log to user and pull the user_name for the performer: rECHU includes/CheckUserLogPager.php:74-97

We can do another join for the target, and drop the cul_target_text column altogether. I am adding @Niharika because if we decide to do this, we should do the same for CheckUser 2.0 as well.

Thoughts:

  • Option one is now possible
  • Option three should be easier to achieve now that the code that logs an entry to cu_log is in a service.

Change 884984 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Add cul_target_text and cul_target_id from cu_log to rename user hook

https://gerrit.wikimedia.org/r/884984

For the time being option one can be used. Need to create a script to retroactively update renamed users too.

Change 884984 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Add cul_target_text and cul_target_id from cu_log to rename user hook

https://gerrit.wikimedia.org/r/884984