Page MenuHomePhabricator

Recent changes entries not updated when renaming users
Closed, ResolvedPublic

Description

See Special:RecentChangesLinked/User:AnomieBOT III/Broken redirects/Userspace for example. The block log entry by Fabrictramp still shows the old username, Rocketsavvy, rather than the new username, Mstrpanos. The rc_title field for recent changes entries corresponding to log entries of types in "getLogTypesOnUser" such as user creation, block, and user rights should be updated to the new username to match the updated log_title.

Event Timeline

Change 703656 had a related patch set uploaded (by Matěj Suchánek; author: Matěj Suchánek):

[mediawiki/extensions/Renameuser@master] Update log entries in recentchanges

https://gerrit.wikimedia.org/r/703656

matej_suchanek triaged this task as Low priority.
matej_suchanek edited projects, added DBA; removed Growth-Team, MediaWiki-Recent-changes.
matej_suchanek subscribed.

This is the update query to be run (asynchronously) on every wiki whenever a user is renamed:

UPDATE recentchanges SET rc_title = 'New_user'
WHERE rc_type = 3 AND rc_log_type IN ('block', 'newusers', 'rights', 'thanks') AND rc_namespace = 2 AND rc_title = 'Old_user'

An obvious choice is the rc_namespace_title_timestamp index:

INDEX rc_namespace_title_timestamp (
    rc_namespace, rc_title, rc_timestamp
)

but there is also one more matching index:

INDEX rc_name_type_patrolled_timestamp (
  rc_namespace, rc_type, rc_patrolled, rc_timestamp
)

The first index should stop matching very early (with zero results for the indexed fields) in the common case in my opinion.

Questions for DBA: Is this possible in WMF environment, or an alternative approach is needed (select primary keys, then update; job, ...)? Do you see a need to FORCE INDEX?

I am not sure I am getting your question right, but if we can avoid using the FORCE INDEX that would be preferred (ie: if the index get its name changed or removed in the future, the query would fail). So definitely selecting (+ updating) by PK is definitely better.

Sorry if I wasn't clear enough. I was mostly concerned about the WHERE part (regardless of SELECT vs. UPDATE, though if there is difference, please teach me). There is currently no exact index on recentchanges which the engine could choose, so the first question is whether such a WHERE (with only "partial" match) would be acceptable at all. If yes, there is the index choice question.

For completeness, select PK & update would look like:

SELECT rc_id FROM recentchanges WHERE rc_type = 3 AND rc_log_type IN ('block', 'newusers', 'rights', 'thanks') AND rc_namespace = 2 AND rc_title = 'Old_user';
UPDATE recentchanges SET rc_title = 'New_user' WHERE rc_id IN (/* ... */);

I suggest you give us new set of users that have been renamed but their block log/etc has not been moved so I can check if it's picking up the right index, etc. In general it looks okay either way as (rc_namespace, rc_title) alone has pretty high selectively.

Some recent renames with recent blocks on enwiki:

LapeerCityRecord
Factchecker_for_ksw
Office_Swissmetal
Toddlinacrossamerica
TulkoffFoods
CaymanIslandsMonetaryAuthority
UCI_School_of_Physical_Sciences
Thewoodspublishing
Dailycalarts
BillboardMagazine
MadMeadyRevenge
4321_team
Previewbot11

Yup, it's pretty quick:

wikiadmin@10.192.16.41(enwiki)> explain select * from recentchanges WHERE rc_type = 3 AND rc_log_type IN ('block', 'newusers', 'rights', 'thanks') AND rc_namespace = 2 AND rc_title = 'Thewoodspublishing'\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: recentchanges
         type: ref
possible_keys: rc_name_type_patrolled_timestamp,rc_ns_actor,rc_namespace_title_timestamp
          key: rc_namespace_title_timestamp
      key_len: 261
          ref: const,const
         rows: 3
        Extra: Using index condition; Using where
1 row in set (0.00 sec)

ERROR: No query specified

wikiadmin@10.192.16.41(enwiki)> explain select * from recentchanges WHERE rc_type = 3 AND rc_log_type IN ('block', 'newusers', 'rights', 'thanks') AND rc_namespace = 2 AND rc_title = 'LapeerCityRecord'\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: recentchanges
         type: ref
possible_keys: rc_name_type_patrolled_timestamp,rc_ns_actor,rc_namespace_title_timestamp
          key: rc_namespace_title_timestamp
      key_len: 261
          ref: const,const
         rows: 5
        Extra: Using index condition; Using where
1 row in set (0.01 sec)

My suggestion would be to do it in a job or later so the user pages are moved so the rows it picks up would be much smaller.

My suggestion would be to do it in a job so the user pages are moved so the rows it picks up would be much smaller.

CentralAuth launches LocalRenameUserJob's for each wiki. But I believe there would be one more row to select if we did this after moving the user page (because of a log entry).

The one more row is nothing, as long it's not thousands of rows to select, it should be just fine.

Change 703656 merged by jenkins-bot:

[mediawiki/extensions/Renameuser@master] Update log entries in recentchanges

https://gerrit.wikimedia.org/r/703656