Page MenuHomePhabricator

Some letters (initial characters of page titles) not being correctly capitalised
Closed, DeclinedPublic

Description

See this example provided by Gorobay@enwiki: http://3v4l.org/WkbBn

The capitalisation data for the character ɱ is missing under HHVM.

For more info, see the discussion: https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Automatic_capitalization_of_title-initial_Unicode_characters

Event Timeline

TTO created this task.Feb 27 2015, 11:22 AM
TTO raised the priority of this task from to Needs Triage.
TTO updated the task description. (Show Details)
TTO added subscribers: TTO, Redrose64, Gadget850, Whatamidoing-WMF.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 27 2015, 11:22 AM
TTO added a subscriber: I18n.Feb 27 2015, 11:22 AM

From https://php.net/ChangeLog-5.php#5.3.4:

Mbstring extension: [...] Fixed bug #52981 (Unicode casing table was out-of-date. Updated with UnicodeData-6.0.0d7.txt and included the source of the generator program with the distribution) (Gustavo).

It's likely that HHVM did not get this fix.

Note that there are 47 characters which the most recent versions of PHP and HHVM do not handle.

MZMcBride triaged this task as High priority.Feb 28 2015, 10:37 PM
MZMcBride added subscribers: tstarling, ori.
MZMcBride added a subscriber: MZMcBride.

We ran into this issue when working on title normalization & redirects in JS. In contrast to mbstring, the JS .toUpperCase() function handles these characters well. As a consequence, redirects between differently-cased versions of a title become self-redirects (ex: https://fr.wikipedia.org/wiki/%EA%9E%80?oldid=125284517).

When fixing the mbstring issue, we'll need to keep in mind that lowercase articles permitted through this bug will become inaccessible. We might need to rename those articles, and move existing redirects out of the way.

Danny_B removed a subscriber: I18n.
Danny_B moved this task from Backlog to Defect on the HHVM board.May 29 2016, 11:25 PM
Amire80 moved this task from Untriaged to Capitalization on the I18n board.Mar 25 2018, 6:28 AM
Krinkle closed this task as Declined.Oct 3 2019, 3:27 AM
Krinkle added a subscriber: Krinkle.

Declining per T192166.

Note that while this task was about PHP5-to-HHVM, a similar issue arose during HHVM-to-PHP72 as well. That issue was eventually tackled at T219279. If we had remembered this report, we could've known it earlier, but oh well.

Restricted Application removed a subscriber: Liuxinyu970226. · View Herald TranscriptOct 3 2019, 3:27 AM