PHP's Unicode capitalization functions change between PHP 7.2 and PHP 7.4 due to PHP's migration to Unicode 11.
Using PHP 7.4's mb_strtoupper() to capitalize the first letter of an article title turns out to be inadvisable, most notably because it would map Georgian characters to their Mtavruli equivalents. Mtavruli is not used for the first character of a word in Georgian, rather it is used for emphasis, like italic.
Unicode's concept of "title case" is much closer to what we want. It doesn't map the Georgian characters, and it maps ligatures in an appropriate way for first-letter capitalization, for example dž becomes Dž instead of DŽ. So, we will use that instead.
Title case would map ß to Ss, which breaks some existing Wikipedia articles and user names without any apparent benefit, so we'll permanently override that so that ß can continue to be used as the first character of a page title.
Migration plan:
- Deploy a backwards-compatible override, so that PHP 7.2 capitalization is used despite PHP 7.4 being fully deployed.
- Run foreachwiki uppercaseTitlesForUnicodeTransition.php --charmap ucfirst-72-to-title.php --userlist /tmp/user_renames.txt --suffix ' (technical rename)' where ucfirst-72-to-title.php is P35451.
- Provide a list of pages which will be renamed to the community. Most affected pages should be deleted rather than automatically renamed.
- Notify users who will be renamed.
- Wait for a week.
- Rerun uppercaseTitlesForUnicodeTransition.php, then rename users: mwscript extensions/WikimediaMaintenance/renameInvalidUsernames.php --wiki metawiki --list /tmp/user_renames.txt
- Wait a while for global renames to take effect
- Rerun uppercaseTitlesForUnicodeTransition.php with the --run option
- Deploy the new override map gerrit 842243. This will prevent further creation of pages or users with initial lowercase letters.