Page MenuHomePhabricator

Global rename Gautehuus → Neuraxıs is stuck on Commons
Closed, ResolvedPublic

Description

https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Neurax%C4%B1s is currently stuck on Commons, listed as "In Progress" for more than 24 hours now. All other renames have completed successfully. https://meta.wikimedia.org/wiki/Special:CentralAuth/Neurax%C4%B1s doesn't display any message about a rename being in progress, while https://meta.wikimedia.org/wiki/Special:CentralAuth/Gautehuus shows the unattached account on Commons.

Event Timeline

MarcoAurelio subscribed.

Given that past MW trains have negatively affected in the resolution of stuck global renames, I think it's best to resolve this prior to installing new MW versions.

hashar subscribed.

Some doc to fix it up is on T145596#2640418

[centralauth]> select * from renameuser_status group by ru_oldname, ru_newname;
+------------+------------+-------------+------------+
| ru_oldname | ru_newname | ru_wiki     | ru_status  |
+------------+------------+-------------+------------+
| Gautehuus  | Neuraxıs   | commonswiki | inprogress |
+------------+------------+-------------+------------+
1 row in set (0.00 sec)

I could not find any exception/error in the logs so it is a mystery.

I have manually triggered the job:

$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki Gautehuus Neuraxıs
Using Céréales Killer as the renamer.
from: Gautehuus
to: Neuraxıs
renamer: Céréales Killer
movepages: 1
suppressredirects: 
reason: per [[m:Special:GlobalRenameQueue/request/27249|request]]

Starting to run job...
Done!

But the rename is still stuck and I still can't find any log/exception :(

https://quarry.wmflabs.org/query/12886 indeed shows the rename is still stuck. Maybe run the script again on terbium and see what it happens?

Are commonswiki DB loaded of pending jobs? Maybe it's stuck in the job queue?

thcipriani raised the priority of this task from High to Unbreak Now!.Oct 4 2016, 5:11 PM

Moving to UBN since this task is blocking the MW train that's supposed to happen shortly.

Why is this a train blocker? Commons was on the old branch when it happened.

Why is this a train blocker? Commons was on the old branch when it happened.

Given that past MW trains have negatively affected in the resolution of stuck global renames, I think it's best to resolve this prior to installing new MW versions.

@Tgr: I'll follow your lead on whether this should block or not.

I added the blocker since past MediaWiki trains worsened the situation of
stuck global renames (see previous tickets). If you think that this should
not be a blocker, then feel free to remove the parent task. Thank you for
your understanding.

Tgr lowered the priority of this task from Unbreak Now! to High.Oct 4 2016, 6:17 PM

It's not related to any new code and it affected a single user on a single wiki so far, so IMO not UBN and no reason to halt the train.

The two obvious problems are that LocalRenameJob skips users with an inprogress state and that it tries to log this in the non-existent rename channel. Will fix manually when I'm in the office.

In order for the rename jobs on the following wikis to have run, the rename must have completed successfully and it somehow rolled back the database changes later on.

I think these are probably the relevant log entries:

runJobs.log
2016-09-28 17:46:54 [V@wBjApAIDgAAGhMK6EAAACG] mw1301 commonswiki 1.28.0-wmf.20 runJobs DEBUG: LocalRenameUserJob Global_rename_job from=Gautehuus to=Neuraxıs renamer=Céréales Killer reattach=array(107) movepages=1 suppressredirects= promotetoglobal= reason=per [[m:Special:GlobalRenameQueue/request/27249|request]] session={"ip":"REDACTED","headers":"array(...)","sessionId":"","userId":0} force= requestId=V@wBjApAIDgAAGhMK6EAAACG (uuid=202a64657a9640158650458bd21fcf50,timestamp=1475084804,QueuePartition=rdb1-6379) STARTING  
2016-09-28 17:46:54 [V@wBjApAIDgAAGhMK6EAAACG] mw1301 commonswiki 1.28.0-wmf.20 runJobs INFO: LocalRenameUserJob Global_rename_job from=Gautehuus to=Neuraxıs renamer=Céréales Killer reattach=array(107) movepages=1 suppressredirects= promotetoglobal= reason=per [[m:Special:GlobalRenameQueue/request/27249|request]] session={"ip":"REDACTED","headers":"array(...)","sessionId":"","userId":0} force= requestId=V@wBjApAIDgAAGhMK6EAAACG (uuid=202a64657a9640158650458bd21fcf50,timestamp=1475084804,QueuePartition=rdb1-6379) COMMIT ENQUEUED [426ms of writes]  
2016-09-28 17:47:24 [V@wBjApAIDgAAGhMK6EAAACG] mw1301 commonswiki 1.28.0-wmf.20 runJobs ERROR: LocalRenameUserJob Global_rename_job from=Gautehuus to=Neuraxıs renamer=Céréales Killer reattach=array(107) movepages=1 suppressredirects= promotetoglobal= reason=per [[m:Special:GlobalRenameQueue/request/27249|request]] session={"ip":"REDACTED","headers":"array(...)","sessionId":"","userId":0} force= requestId=V@wBjApAIDgAAGhMK6EAAACG (uuid=202a64657a9640158650458bd21fcf50,timestamp=1475084804,QueuePartition=rdb1-6379) t=30607 error=DBError: Timed out waiting on commit queue.

There's code in there that tries to serialize the commits of all jobs that took longer than 0.1s, and apparently just rolls the job back without rescheduling if it can't grab the serialization lock within 30 seconds.

@aaron might be able to tell us more about the "COMMIT ENQUEUED" / "Timed out waiting on commit queue" situation.

After pouring through the rename code, I didn't see any way in which re-running a halfway aborted rename could cause problems, so I just changed the status to failed and re-ran the script. The user account should be fixed now.

Possible follow-ups:

  • fix the underlying error
  • fixStuckGlobalRename.php should treat inprogress status as failed
  • the rename log channel should go somewhere
  • "https://meta.wikimedia.org/wiki/Special:CentralAuth/Neurax%C4%B1s doesn't display any message about a rename being in progress" - should that be changed? (there is some exception-handling code in Special:CentralAuth which did that, but that's not called anymore since CentralAuthUser stopped throwing exceptions on unattached accounts)

Looks good; login was successful. Thank you.

Change 314218 had a related patch set uploaded (by Gergő Tisza):
Add ignorestatus option for fixing stuck renames

https://gerrit.wikimedia.org/r/314218

Change 314219 had a related patch set uploaded (by Gergő Tisza):
Set failed rename queue status on late errors

https://gerrit.wikimedia.org/r/314219

Change 314218 merged by jenkins-bot:
Add ignorestatus option for fixing stuck renames

https://gerrit.wikimedia.org/r/314218

Change 314219 merged by jenkins-bot:
Set failed rename queue status on late errors

https://gerrit.wikimedia.org/r/314219

Tgr claimed this task.

Possible follow-ups:

  • fix the underlying error
  • fixStuckGlobalRename.php should treat inprogress status as failed
  • the rename log channel should go somewhere
  • "https://meta.wikimedia.org/wiki/Special:CentralAuth/Neurax%C4%B1s doesn't display any message about a rename being in progress" - should that be changed? (there is some exception-handling code in Special:CentralAuth which did that, but that's not called anymore since CentralAuthUser stopped throwing exceptions on unattached accounts)

First three are done, I'll call this fixed. If someone feels strongly about the last, feel free to file a task and assign it to me.

Change 315364 had a related patch set uploaded (by Gergő Tisza):
Add ignorestatus option for fixing stuck renames

https://gerrit.wikimedia.org/r/315364

Change 315364 merged by jenkins-bot:
Add ignorestatus option for fixing stuck renames

https://gerrit.wikimedia.org/r/315364

Mentioned in SAL (#wikimedia-operations) [2016-10-11T23:41:45Z] <ebernhardson@mira> Synchronized php-1.28.0-wmf.21/extensions/CentralAuth/: SWAT T147029 Add ignorestatus option for fixing stuck renames (duration: 00m 53s)