Page MenuHomePhabricator

Configure title-case consistency mapping for PHP 8.1 -> 8.3 transition
Closed, ResolvedPublic

Description

Process:

  • Use maintenance/language/generateUpperCharTable.php to generate title-case tables for both 8.1 and 8.3. -- These are identical (T401252#11066941).
  • [SKIPPED] Use maintenance/language/generateUcfirstOverrides.php to generate mappings that override 8.3 title-casing (--override) with 8.1 title-casing (--with).
  • [SKIPPED] Merge the resulting overrides into wmf-config/UcfirstOverrides.php while leaving the permanent override for Eszett intact.
NOTE: The mbstring extension in both 8.1 and 8.3 supports Unicode 14, resulting in identical title-casing behavior. Thus, for the purposes of the 8.1 -> 8.3 migration, no consistency overrides should be necessary.

See T372603: Regenerate UcfirstOverrides.php for PHP 7.4 -> 8.1 transition for prior art during the last PHP migration.

Related Objects

Event Timeline

Scott_French changed the task status from Open to In Progress.Aug 6 2025, 7:32 PM

PHP 8.3 images now exist, with the first-available cli image being docker-registry.discovery.wmnet/restricted/mediawiki-multiversion-cli:2025-08-06-191212-publish-83.

With 8.3 images now built (but otherwise unused), I was able to pick this up today.

Since durable storage for output files is not supported under mwscript-k8s, I've used the workaround referenced here in Wikitech in what follows - i.e., opening a shell.php job in --attach mode and executing scripts within that container, then using kubectl cp to retrieve output files.

Not pictured: Creation of two shell.php jobs, one targeting the then-latest PHP 8.1 image (i.e., the default behavior) and another passing --mediawiki_image restricted/mediawiki-multiversion-cli:2025-08-06-191212-publish-83 to target the new 8.3 image. These correspond to the mw-script.eqiad.hc0dqjj3-gljr7 and mw-script.eqiad.y7cwin33-2fgk pods, respectively.

swfrench@deploy1003:~/uctables_8.1_8.3$ kube_env mw-script-deploy eqiad  # Note: -deploy credentials are needed for exec and cp
swfrench@deploy1003:~/uctables_8.1_8.3$ kubectl exec pod/mw-script.eqiad.hc0dqjj3-gljr7 -c mediawiki-hc0dqjj3-app -it -- /bin/bash
www-data@mw-script:/data$ php -v
PHP 8.1.33 (cli) (built: Jul 24 2025 21:07:29) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.1.33, Copyright (c) Zend Technologies
    with Zend OPcache v8.1.33, Copyright (c), by Zend Technologies
www-data@mw-script:/data$ mwscript maintenance/language/generateUpperCharTable.php testwiki --titlecase --outfile /tmp/uctable_8.1.json
www-data@mw-script:/data$ 
exit
swfrench@deploy1003:~/uctables_8.1_8.3$ kubectl cp mw-script.eqiad.hc0dqjj3-gljr7:/tmp/uctable_8.1.json uctable_8.1.json
swfrench@deploy1003:~/uctables_8.1_8.3$ kubectl exec pod/mw-script.eqiad.y7cwin33-2fgkl -c mediawiki-y7cwin33-app -it -- /bin/bash
www-data@mw-script:/data$ php -v
PHP 8.3.24 (cli) (built: Aug  4 2025 18:25:53) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.3.24, Copyright (c) Zend Technologies
    with Zend OPcache v8.3.24, Copyright (c), by Zend Technologies
www-data@mw-script:/data$ mwscript maintenance/language/generateUpperCharTable.php testwiki --titlecase --outfile /tmp/uctable_8.3.json
www-data@mw-script:/data$ 
exit
swfrench@deploy1003:~/uctables_8.1_8.3$ kubectl cp mw-script.eqiad.y7cwin33-2fgkl:/tmp/uctable_8.3.json uctable_8.3.json
swfrench@deploy1003:~/uctables_8.1_8.3$ ls -l
total 63672
-rw-rw-r-- 1 swfrench wikidev 32599320 Aug  6 20:31 uctable_8.1.json
-rw-rw-r-- 1 swfrench wikidev 32599320 Aug  6 20:38 uctable_8.3.json
swfrench@deploy1003:~/uctables_8.1_8.3$ md5sum *
ce210aaf11c6d32695f6478a19e2fd2f  uctable_8.1.json
ce210aaf11c6d32695f6478a19e2fd2f  uctable_8.3.json
swfrench@deploy1003:~/uctables_8.1_8.3$

Indeed, this is consistent with inspection of the mbstring Unicode tables in the PHP source code as noted in the task description: both PHP versions support Unicode 14 and thus produce identical title-casing behavior.

Which is to say, for the purposes of the 8.1 -> 8.3 upgrade, it seems we will not need to introduce title-case consistency overrides.