Background
In T372603, we updated UcfirstOverrides.php [0] to maintain title-case consistency between PHP 7.4 (Unicode 11) and 8.1 (Unicode 14).
Once there exist no more PHP 7.4 workloads, we need to clean these up, while being sure to leave the permanent override for Eszett intact.
There will be no more PHP 7.4 workloads when:
- We are no longer building and testing (in mw-debug) 7.4 images during deployments (T391057) (note: this is happening very soon).
- We have completed the mw-cron migration (T341555) and mwscript can no longer be used on 7.4 (mwmaint) hosts (T341553).
See also T292552 for the last cleanup of this type, which combined both the 7.2 - 7.4 consistency cleanup and the transition to title-case.
Schedule
The renames were completed on Tuesday, 1st of July.
Process
This outlines the specific process used during this cleanup. See T292552 for the previous one.
- Prepare the character mapping.
Generate a title-case character mapping from the current state (i.e., consistent with older Unicode version) to the desired state (i.e., consistent with newer Unicode version).
This is the same process as used to generate UcfirstOverrides.php with generateUcfirstOverrides.php (and indeed uses the same title-casing character tables), but with the --override and --with options reversed - i.e., in our case, override 7.4 with 8.1 (not vice versa). An example character mapping from this task can be found in P76371. Compare, e.g., with the then-current state of UcfirstOverrides.php at the time that was enforcing 7.4 - 8.1 consistency.
- Collect users, pages, images, etc. to be renamed.
This can be done by running uppercaseTitlesForUnicodeTransition.php over all wikis in its (default) dry-run mode. Renamed pages, images, etc. will be logged by the script (look for "Would ..." in dry-run mode).
As of this writing, mwscript-k8s does not support persistent output files, which complicates collection of the renamed user list (--userlist). There are a couple of ways around this, but two simple options include:
- Use mwscript via foreachwiki to run the script locally on the active deployment host (similar to T292552).
- Use mwscript-k8s with, e.g., --userlist 'php://stdout' to instead emit the renamed user list to stdout, which can then be processed out of the container logs.
Here, we went with the second option, though some care is needed to strip the foreachwiki-like wiki name prefix added to log lines from the tab-separated (wiki, user ID, new name) tuples.
mwscript-k8s --follow --dblist=all --file=override-74-to-81.php -- uppercaseTitlesForUnicodeTransition.php --charmap override-74-to-81.php --suffix ' (technical rename)' --userlist 'php://stdout'
Review the resulting renames with subject-matter experts.
- Communicate the planned renames and notify affected users.
For non-user renames, we opened a subtask providing context and the list of renames (T396903), and then announced the plan via Tech News with anticipated timing. Given the small number of users affected (1), we notified them directly via Special:EmailUser on metawiki.
- Prepare a patch reverting UcfirstOverrides.php to the desired state.
This should clear the title-case overrides introduced for the migration, leaving only the permanent override for Eszett (e.g., https://gerrit.wikimedia.org/r/1152295). You will not deploy the patch until step #6.
- Perform renames.
Re-run #2 (again in dry-run mode) to collect a fresh set of renames, particularly the renamed user list.
Since the renameInvalidUsernames.php script performs a global rename, the user list only needs to contain a single (wiki, user ID, new name) tuple per global user to be renamed. To avoid unnecessary duplicate global renames, you can deduplicate the user list to just one per global user.
mwscript-k8s --follow --file=deduplicated.userlist.txt -- extensions/WikimediaMaintenance/renameInvalidUsernames.php --wiki metawiki --list deduplicated.userlist.txt --reason 'Technical rename for Unicode transition'
Wait for these to complete asynchronously after the script completes, which can take some time (tens of minutes) depending on the number of renames and state of the job queue (you can also check LocalRenameUserJob in the JobQueue Job dashboard to see representative backlog times; remember to select the current primary DC). If there are a very small number of renames, you can check on their progress individually via https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress.
Once the user renames are complete, continue with the page, etc. renames. If there are a large number of affected wikis, your best option may be to again use mwscript-k8s --dblist=all for this, similar to step #2. If there are a small number of affected wikis (as was the case here), it will be much faster to run mwscript-k8s for each affected wiki one-at-a-time (see T394556#10925709).
In either case, the only difference in invocation vs. step #2 is adding the --run option to take the script out of dry-run.
One issue we ran into at this stage was a failed rename due to an AbuseFilter rule (T394556#10964880). The rule was temporarily disabled as a workaround. See also T398384.
- Merge UcfirstOverrides.php cleanup.
Before starting, take note of the last-deployed mediawiki-multiversion-cli image. The simplest way to find this is to consult /etc/helmfile-defaults/mediawiki/release/mw-script-main.yaml on the deployment host.
Merge and deploy your change with scap backport.
Once that's complete, you can verify that nothing sneaked through between the rename script running and the backport deployment (which prevents creation of entities following the older title-casing behavior) by running step #2 yet again, but now with the mwscript-k8s --mediawiki_image flag set to the previous CLI image version you collected above.
If anything did sneak through, you can again use this same technique to re-run the script with --run under the previous image.