Page MenuHomePhabricator

Expose phpCharToUpper map for title normalization via the API
Open, Needs TriagePublic


PHP's mb_strtoupper has some oddities around characters that are supposed to be transformed into characters with a different length, this is covered in T141723#5057472 and T219279.

MediaWiki ships a map of these characters in for use in JavaScript. It's also copied into mediawiki-title (JavaScript) and I have a port to Rust as well

This should be exposed via the API so external libraries don't have to copy the map. Preferably in action=query&meta=siteinfo because that already has all the information needed to normalize and validate titles. It should also dump $wgOverrideUcfirstCharacters, if it's set.

Ideally this map would be generated based on the ICU/Unicode version that the server is using, and not whatever Wikimedia production is using and checked into core, but that's a separate issue.