See for example https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Ratte which is stuck on several wikis. (Nuevo Paso might be another example.)
Most global renames seem to work though.
I found no relevant log entries in logstash. (There are a bunch of unresolvable Hausratte@<wiki> errors which is probably just a consequence of CentralAuth trying an account migration process some time after the rename errors, and failing on the non-renamed wikis.)
Description
Details
Related Objects
- Mentioned In
- T370940: Maintenance_bot removing patch-for-review cause archived projects to be removed from task
T370388: Large numbers of global renames result in failures of local renaming on enwiki
T188171: LocalRenameUserJob: escape '$' in replacement title
T145596: Renames getting stuck on mediawiki.org (Sept 13, 2016)
T141020: Users with detached local accounts after rename during train deployment of 1.27.0-wmf.11
T140074: Attach Biplab Anand's local accounts
T119736: Could not find local user data for {Username}@{wiki}
T135656: GlobalRename is broken, presumably due to authmanager changes - Mentioned Here
- T73924: Add users_to_rename table to centralauth database
rECAU42c451c3adae: Add forceRenameUsers.php
T135656: GlobalRename is broken, presumably due to authmanager changes
Event Timeline
HELLO? Is anyone working on this? It's pretty annoying that no one seems to care to quickly fix this... I apologize if I'm mistaken, but it certainly looks this way.
Nobody is currently assigned to solve this task.
@Aklapper Could you try to find someone with a knowledge of how global rename
code works, so we can devise a script to manually achieve these stuck
renames?
Last I heard, @Tgr and @Legoktm talked about this at Wikimania and Lego had a plan of some sort.
This and T135656 are probably referring to the same issue, at least at the moment. This one is so vague and the other has had several iterations of similar problems under its banner which makes it hard to be more specific.
Can at least the renames stuck be resolved? Those users ain't able to log-in anymore until their renames are completed? We've issued warnings to the global renamers and stewards to avoid further renames until this is fixed. Thank you.
As Anomie noted earlier, what probably happens is that (when a user has accounts on many wikis) lots of rename jobs are started at the same time, each job tries to reset the central token, some of them fail, and leave the user in some state where automatically reattempting the rename does not work. (At a guess, there is a CAS error when invalidateSessionsForUser is called, which causes the CentralAuthUser to not be saved, which causes the user save at the very end of RenameUserSQL::rename to not be a no-op for CentralAuthHooks::onUserSaveSettings, and there is a lock wait timeout at that point; by then the user is fully renamed, so the next rename attempt will not find it.) Since the rename is marked as "in process", MediaWiki refuses to log the user in.
We discussed this at the hackathon and the easiest fix is to make the jobs run sequentially, instead of in parallel: instead of scheduling all the jobs at start, just have one rename job schedule the next one. That will make renames slower for users with many accounts, but as I understand that's not considered a big problem.
(The more complex alternative would be to ensure that the user is not logged in on the wiki without global token resets, e.g. by breaking the local session and then having CentralAuthSessionProvider check some sort of blacklist.)
Change 297537 had a related patch set uploaded (by Gergő Tisza):
Make LocalRename jobs run sequentially
Five users are stuck:
mysql:wikiadmin@db1079 [centralauth]> select ru_oldname, ru_newname, count(*) from renameuser_status group by ru_oldname, ru_newname; +------------------------------------+------------------+----------+ | ru_oldname | ru_newname | count(*) | +------------------------------------+------------------+----------+ | Acee8 | Nuevo Paso | 103 | | Hausratte | Ratte | 58 | | Markouzki | Quijx | 1 | | Михаил Марчук | Tot Samyj Niekto | 13 | | बिप्लब आनन्द | Biplab Anand | 258 | +------------------------------------+------------------+----------+
Markouzki was actually moved but the status got stuck; fixed that. The other four are moved globally and on some wikis but not on many others. Not sure how to deal with that; is that what forceRenameUsers.php is for?
I understand there is a patch for review. Will this patch fix the renames getting stuck problem, or simply get the current renames unstuck?
As far i can see https://gerrit.wikimedia.org/r/297537 is fixing the rename problem but not the blocked accounts. After the patch is merged we can start with renaming users again?
Change 297697 had a related patch set uploaded (by Legoktm):
Make LocalRename jobs run sequentially
Change 297698 had a related patch set uploaded (by Legoktm):
Make LocalRename jobs run sequentially
Mentioned in SAL [2016-07-07T00:03:29Z] <legoktm@tin> Synchronized php-1.28.0-wmf.8/extensions/CentralAuth/: Make LocalRename jobs run sequentially - T137973 (duration: 00m 34s)
Mentioned in SAL [2016-07-07T00:05:17Z] <legoktm@tin> Synchronized php-1.28.0-wmf.9/extensions/CentralAuth/: Make LocalRename jobs run sequentially - T137973 (duration: 00m 30s)
Mentioned in SAL [2016-07-07T00:06:50Z] <legoktm@tin> Synchronized php-1.28.0-wmf.8/extensions/CentralAuth/: Make LocalRename jobs run sequentially - T137973 (for real this time) (duration: 00m 30s)
Email sent to the global-renamers list:
Hi,
Tgr and Anomie (send cookies and thanks their way!) worked on a patch to fix global rename by making the jobs for each wiki run one at a time instead of in parallel. Some testing on renames today showed that the fix is working and there haven't been any issues yet.
Global rename is now significantly slower - you'll notice that one wiki goes at a time and will be processed in order.
For now, please only start one rename at a time. Keep an eye on the overall Special:GlobalRenameProgress and make sure there aren't more than 10 other renames currently running. I know there is a backlog of rename requests, but let's not clear the entire queue at once. :-)
- Kunal
It is happening again. All of these nine cases got stuck.
Ami Ruse → Jack Dobson (view progress) Benjaminekman → Benjekman (view progress) Elayamir → TheGodfather85 (view progress) Fra150190 → Superpes15 (view progress) Hausratte → Ratte (view progress) MahdiEynian → MikeEcho (view progress) SamLikesPlanes → Substellar (view progress) Sarybe → Tommy377 (view progress) Yareth Sarmiento → Khloe S Castro (view progress)
I tried to understand what is wrong. For example. In case of MikeEcho It finished loginwiki but never get to start mediawikiwiki. I saved logs for this case in logstash.
Yes, but now there are other poor users stuck in limbo. Looks like the problem has gotten worse.
Yes definitely. So before we proceed for any new rename we have to check the progress first.
Got Problem once again:) my account is not attached with more than 256 accounts :)
https://commons.wikimedia.org/w/index.php?title=Special%3ACentralAuth&target=Biplab+Anand
Looks like the serializing of the jobs isn't quite working. For example,
2016-07-07 04:27:53 [V33Z4QpAEKsAABC3WScAAABY] mw1167 jawiki 1.28.0-wmf.8 runJobs DEBUG: LocalRenameUserJob Global_rename_job from=MahdiEynian to=MikeEcho renamer=Ladsgroup movepages=1 suppressredirects= promotetoglobal= reason=per [[m:Special:GlobalRenameQueue/request/25140|request]] session={...} force= requestId=V33Z4QpAEKsAABC3WScAAABY (uuid=beaf29220639406c8b2bb2bb16bbada5,timestamp=1467865670,QueuePartition=rdb3-6380) STARTING 2016-07-07 04:27:54 [V33Z4QpAEKsAABC3WScAAABY] mw1165 loginwiki 1.28.0-wmf.9 runJobs DEBUG: LocalRenameUserJob Global_rename_job from=MahdiEynian to=MikeEcho renamer=Ladsgroup movepages=1 suppressredirects= promotetoglobal= reason=per [[m:Special:GlobalRenameQueue/request/25140|request]] session={...} force= requestId=V33Z4QpAEKsAABC3WScAAABY (uuid=dabadab2fcfd408687ee498edfdde3ef,timestamp=1467865673,QueuePartition=rdb1-6380) STARTING 2016-07-07 04:27:54 [V33Z4QpAEKsAABC3WScAAABY] mw1167 jawiki 1.28.0-wmf.8 runJobs INFO: LocalRenameUserJob Global_rename_job from=MahdiEynian to=MikeEcho renamer=Ladsgroup movepages=1 suppressredirects= promotetoglobal= reason=per [[m:Special:GlobalRenameQueue/request/25140|request]] session={...} force= requestId=V33Z4QpAEKsAABC3WScAAABY (uuid=beaf29220639406c8b2bb2bb16bbada5,timestamp=1467865670,QueuePartition=rdb3-6380) t=240 good 2016-07-07 04:27:54 [V33Z4QpAEKsAABC3WScAAABY] mw1167 jawiki 1.28.0-wmf.8 runJobs DEBUG: LocalRenameUserJob Global_rename_job from=MahdiEynian to=MikeEcho renamer=Ladsgroup movepages=1 suppressredirects= promotetoglobal= reason=per [[m:Special:GlobalRenameQueue/request/25140|request]] session={...} force= requestId=V33Z4QpAEKsAABC3WScAAABY (uuid=19ea2f7808d74ba0a05b01353cc39fd8,timestamp=1467865674,QueuePartition=rdb1-6381) STARTING 2016-07-07 04:27:54 [V33Z4QpAEKsAABC3WScAAABY] mw1167 jawiki 1.28.0-wmf.8 runJobs INFO: LocalRenameUserJob Global_rename_job from=MahdiEynian to=MikeEcho renamer=Ladsgroup movepages=1 suppressredirects= promotetoglobal= reason=per [[m:Special:GlobalRenameQueue/request/25140|request]] session={...} force= requestId=V33Z4QpAEKsAABC3WScAAABY (uuid=19ea2f7808d74ba0a05b01353cc39fd8,timestamp=1467865674,QueuePartition=rdb1-6381) t=15 good 2016-07-07 04:27:54 [V33Z4QpAEKsAABC3WScAAABY] mw1165 loginwiki 1.28.0-wmf.9 runJobs INFO: LocalRenameUserJob Global_rename_job from=MahdiEynian to=MikeEcho renamer=Ladsgroup movepages=1 suppressredirects= promotetoglobal= reason=per [[m:Special:GlobalRenameQueue/request/25140|request]] session={...} force= requestId=V33Z4QpAEKsAABC3WScAAABY (uuid=dabadab2fcfd408687ee498edfdde3ef,timestamp=1467865673,QueuePartition=rdb1-6380) t=132 good
It looks like what might be happening is this:
- jawiki job auto-starts a DB transaction thanks to DBO_DEFAULT/DBO_TRX.
- jawiki rename finishes.
- jawiki job schedules the next wiki, loginwiki.
- loginwiki job is started.
- loginwiki rename finishes. I note it logged a CAS error, probably because the jawiki job didn't commit its DB writes yet.
- loginwiki job schedules the next wiki. Since the jawiki job hasn't committed its transaction yet, it doesn't see that jawiki is marked as "done" so it schedules for jawiki.
- First jawiki job commits its transaction.
- Second jawiki job runs, finds that jawiki has already been done, and bails out.
Change 297817 had a related patch set uploaded (by Anomie):
Fix job serializing (and status display on Special:GlobalRenameProgress)
Change 297817 merged by jenkins-bot:
Fix job serializing (and status display on Special:GlobalRenameProgress)
Change 297941 had a related patch set uploaded (by Legoktm):
Fix job serializing (and status display on Special:GlobalRenameProgress)
Mentioned in SAL [2016-07-08T18:52:01Z] <anomie> Attempting to resubmit LocalRenameUserJobs for T137973
Oh... I didn't pay close enough attention, the backport was submitted as https://gerrit.wikimedia.org/r/#/c/297941/ but not actually merged and backported, so things might still fail as they did yesterday.
Change 297941 merged by jenkins-bot:
Fix job serializing (and status display on Special:GlobalRenameProgress)
Mentioned in SAL [2016-07-08T22:02:49Z] <legoktm@tin> Synchronized php-1.28.0-wmf.9/extensions/CentralAuth/: Fix job serializing (and status display on Special:GlobalRenameProgress) - T137973 (duration: 00m 32s)
Can we start renaming again? Recent renames seem to be going through, but I'm not getting any "official" confirmation that the problem is fixed for sure.
Update from IRC:
legoktm set the topic: (...) | Status: <10 concurrent renames plz
Which means not more than 10 renames at https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress
Apart from that we did a lot of renames today (~300) and all where successful. (Exempt one: 25356 is broken, but that is likely unrelated to this bug.)
Did you typed the password that you are using for your Wikimedia account? If yes, was it successful? If not, then try using password reset for all sites that failed.
@Pokefan95 wrote:
If not, then try using password reset for all sites that failed.
This won't work, the CA seems broken. A tech has to look into it. This is not a standard problem, you won't be able to help him.
Change 299887 had a related patch set uploaded (by Gergő Tisza):
Make LocalRename jobs run sequentially
Change 299899 had a related patch set uploaded (by Gergő Tisza):
Fix job serializing (and status display on Special:GlobalRenameProgress)
Change 299899 merged by jenkins-bot:
Fix job serializing (and status display on Special:GlobalRenameProgress)
To manually fix a blocked rename, one can run:
mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php
Has to be run for each of wikis.
Thanks for the tip @hashar. I guess people can ping me (or lots of other people with access to terbium) to do it in case of happening.
May I suggest to create a script which doesn't need to be run on each wiki
where the rename gets stuck. I think it might be a bit boring for you
all having to run the script 10 times if the account become stuck on ten
sites.