Old username: StoicNeanderthal
New username: Renamed user 5401aafa5557bf5c36b752af3b938b14
Since: 22:01, 31 July 2025 (Stuck for 10 hours)
The global rename appears to be stuck. Please assist in unblocking or completing the rename.
Old username: StoicNeanderthal
New username: Renamed user 5401aafa5557bf5c36b752af3b938b14
Since: 22:01, 31 July 2025 (Stuck for 10 hours)
The global rename appears to be stuck. Please assist in unblocking or completing the rename.
This one looks different from most of the recent surge of stuck renames, as it is stuck with the 'In progress' status.
This would suggest a bug somewhere in the job code that caused it to crash before completing the work, but I can't find the exception anywhere in the logs.
There is instead this, which I don't understand: https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-k8s-1-7.0.0-1-2025.07.31?id=dF2AYpgBsPjmLNTo7R87
That sounds like an error in the job runner rather than the job? The job was scheduled, the status was set to In progress, but then the job runner crashed and never actually executed the job?
Although then I'd expect retries, but I don't understand the job infra well, maybe there is some situation where that's not the case.
But the 'In progress' status is set from inside the job code: https://gerrit.wikimedia.org/g/mediawiki/extensions/CentralAuth/+/bf691fb5c7eae374f0b6f02867d6e471bbb71902/includes/GlobalRename/LocalRenameJob/LocalRenameUserJob.php#65 So it must have started, but there is no record of it finishing.
The job runner log is from the thing that makes HTTP requests to RunSingleJob.php which really runs the jobs, if I understand it correctly, and it says "socket hang up". It's almost as if someone unplugged the server running the job, or whatever the Kubernetes equivalent is.
There was in fact a retry, but it skipped itself because the status was already 'In progress': https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2025.07.31?id=oTiBYpgBgiE0yhV92O0l
*shrug* Maybe it's a one-off random thing. But we should keep an eye out at any global renames stuck on 'In progress'.
I remember before the K8s era, we used to have docs that showed a way to bypass issues like this, where a job is showing running or in progress when it's not. The --ignorestatus options should fix it, even today, I suppose?
But it seems this is a known problem {T147029#2691269}. @Tgr tried to address this in 2016.
See: https://wikitech.wikimedia.org/w/index.php?title=Stuck_global_renames&diff=prev&oldid=2295101
--ignorestatus is a parameter that can be added to fix a global rename that's stuck in running state even though the job is not actually running (so trying to run the script without this switch gives skipping duplicate rename from...). This used to happen a lot in the past due to fragile error handling; shouldn't be the case anymore. Use with care; could make a mess if the job really is running.
Yes, I think we just need to run the script with --ignorestatus. I was concerned that the rename on enwiki could be stuck in some halfway-done bad state, but looking at the data now, it seems that it wasn't started at all (or was rolled back neatly when the script crashed/disappeared).
Ack! That sounds good, do you want me to take on this (I'm on chores this week) and T400862: Unblock stuck global rename of Renamed user f74bdbce92f61493475fa5230c4922b0 too? I see that for T400862, you are currently tracking something in production?
I want to use this opportunity to verify the fix for T398177 in production. I'd appreciate if you could run the script after we backport the patch for that bug (this should happen today).
Patch was backported, please run the script when you can, and I'll double-check the result later.
Mentioned in SAL (#wikimedia-operations) [2025-08-05T10:36:05Z] <xSavitar> Ran fixStuckGlobalRename.php for T400974