Page MenuHomePhabricator

mw.ustring.lower doesn't affect hypogegrammene
Closed, DuplicatePublic

Description

I don't know what other characters or text functions this affects, but, well:

mw.ustring.lower("ΑΆἈἌἎἉἍἏᾼᾈᾌᾎᾉᾍᾏ")

αάἀἄἆἁἅἇᾼᾈᾌᾎᾉᾍᾏ

mw.ustring.upper("αάἀἄἆἁἅἇᾳᾀᾄᾆᾁᾅᾇ")

ΑΆἈἌἎἉἍἏᾼᾈᾌᾎᾉᾍᾏ

Event Timeline

ObsequiousNewt raised the priority of this task from to Needs Triage.
ObsequiousNewt updated the task description. (Show Details)
ObsequiousNewt subscribed.
Anomie subscribed.

The behavior of Scribunto's ustring upper and lower methods depends on the behavior of PHP's mb_strtoupper and mb_strtolower, which also exhibit this behavior.

The problem seems to be that PHP's mb_strtolower() ignores any character that doesn't have the "uppercase" Unicode property, and these characters are flagged as "titlecase". Whether that's the correct behavior for mb_strtolower() or whether it should be checking for "uppercase or titlecase", I have no idea. But even if it is incorrect, it's not something that we're going to be able to fix here. You'll need to take it to https://bugs.php.net/; please comment here with the upstream bug number once you find/create it.

Anomie changed the task status from Open to Stalled.Mar 18 2015, 4:18 PM
Aklapper lowered the priority of this task from Low to Lowest.Mar 20 2015, 11:30 AM

This is in the process of being fixed by T176370: Migrate to PHP 7 in WMF production. I'm going to close this as a duplicate of that task.