Page MenuHomePhabricator

Rename articles and users to prepare for PHP 7.3 unicode changes
Open, HighPublic

Description

The behavior of mb_strtoupper changes between PHP 7.2 and PHP 7.3, and doesn't change going to 7.4.

To avoid articles becoming unreachable the following migration map needs to be applied:

1<?php
2// File created by generateUcfirstOverrides.php
3return [
4 'ß' => 'SS',
5 'ʼn' => 'ʼN',
6 'ǰ' => 'J̌',
7 'ͅ' => 'Ι',
8 'ΐ' => 'Ϊ́',
9 'ΰ' => 'Ϋ́',
10 'և' => 'ԵՒ',
11 'ა' => 'Ა',
12 'ბ' => 'Ბ',
13 'გ' => 'Გ',
14 'დ' => 'Დ',
15 'ე' => 'Ე',
16 'ვ' => 'Ვ',
17 'ზ' => 'Ზ',
18 'თ' => 'Თ',
19 'ი' => 'Ი',
20 'კ' => 'Კ',
21 'ლ' => 'Ლ',
22 'მ' => 'Მ',
23 'ნ' => 'Ნ',
24 'ო' => 'Ო',
25 'პ' => 'Პ',
26 'ჟ' => 'Ჟ',
27 'რ' => 'Რ',
28 'ს' => 'Ს',
29 'ტ' => 'Ტ',
30 'უ' => 'Უ',
31 'ფ' => 'Ფ',
32 'ქ' => 'Ქ',
33 'ღ' => 'Ღ',
34 'ყ' => 'Ყ',
35 'შ' => 'Შ',
36 'ჩ' => 'Ჩ',
37 'ც' => 'Ც',
38 'ძ' => 'Ძ',
39 'წ' => 'Წ',
40 'ჭ' => 'Ჭ',
41 'ხ' => 'Ხ',
42 'ჯ' => 'Ჯ',
43 'ჰ' => 'Ჰ',
44 'ჱ' => 'Ჱ',
45 'ჲ' => 'Ჲ',
46 'ჳ' => 'Ჳ',
47 'ჴ' => 'Ჴ',
48 'ჵ' => 'Ჵ',
49 'ჶ' => 'Ჶ',
50 'ჷ' => 'Ჷ',
51 'ჸ' => 'Ჸ',
52 'ჹ' => 'Ჹ',
53 'ჺ' => 'Ჺ',
54 'ჽ' => 'Ჽ',
55 'ჾ' => 'Ჾ',
56 'ჿ' => 'Ჿ',
57 'ẖ' => 'H̱',
58 'ẗ' => 'T̈',
59 'ẘ' => 'W̊',
60 'ẙ' => 'Y̊',
61 'ẚ' => 'Aʾ',
62 'ὐ' => 'Υ̓',
63 'ὒ' => 'Υ̓̀',
64 'ὔ' => 'Υ̓́',
65 'ὖ' => 'Υ̓͂',
66 'ᾀ' => 'ἈΙ',
67 'ᾁ' => 'ἉΙ',
68 'ᾂ' => 'ἊΙ',
69 'ᾃ' => 'ἋΙ',
70 'ᾄ' => 'ἌΙ',
71 'ᾅ' => 'ἍΙ',
72 'ᾆ' => 'ἎΙ',
73 'ᾇ' => 'ἏΙ',
74 'ᾈ' => 'ἈΙ',
75 'ᾉ' => 'ἉΙ',
76 'ᾊ' => 'ἊΙ',
77 'ᾋ' => 'ἋΙ',
78 'ᾌ' => 'ἌΙ',
79 'ᾍ' => 'ἍΙ',
80 'ᾎ' => 'ἎΙ',
81 'ᾏ' => 'ἏΙ',
82 'ᾐ' => 'ἨΙ',
83 'ᾑ' => 'ἩΙ',
84 'ᾒ' => 'ἪΙ',
85 'ᾓ' => 'ἫΙ',
86 'ᾔ' => 'ἬΙ',
87 'ᾕ' => 'ἭΙ',
88 'ᾖ' => 'ἮΙ',
89 'ᾗ' => 'ἯΙ',
90 'ᾘ' => 'ἨΙ',
91 'ᾙ' => 'ἩΙ',
92 'ᾚ' => 'ἪΙ',
93 'ᾛ' => 'ἫΙ',
94 'ᾜ' => 'ἬΙ',
95 'ᾝ' => 'ἭΙ',
96 'ᾞ' => 'ἮΙ',
97 'ᾟ' => 'ἯΙ',
98 'ᾠ' => 'ὨΙ',
99 'ᾡ' => 'ὩΙ',
100 'ᾢ' => 'ὪΙ',
101 'ᾣ' => 'ὫΙ',
102 'ᾤ' => 'ὬΙ',
103 'ᾥ' => 'ὭΙ',
104 'ᾦ' => 'ὮΙ',
105 'ᾧ' => 'ὯΙ',
106 'ᾨ' => 'ὨΙ',
107 'ᾩ' => 'ὩΙ',
108 'ᾪ' => 'ὪΙ',
109 'ᾫ' => 'ὫΙ',
110 'ᾬ' => 'ὬΙ',
111 'ᾭ' => 'ὭΙ',
112 'ᾮ' => 'ὮΙ',
113 'ᾯ' => 'ὯΙ',
114 'ᾲ' => 'ᾺΙ',
115 'ᾳ' => 'ΑΙ',
116 'ᾴ' => 'ΆΙ',
117 'ᾶ' => 'Α͂',
118 'ᾷ' => 'Α͂Ι',
119 'ᾼ' => 'ΑΙ',
120 'ῂ' => 'ῊΙ',
121 'ῃ' => 'ΗΙ',
122 'ῄ' => 'ΉΙ',
123 'ῆ' => 'Η͂',
124 'ῇ' => 'Η͂Ι',
125 'ῌ' => 'ΗΙ',
126 'ῒ' => 'Ϊ̀',
127 'ΐ' => 'Ϊ́',
128 'ῖ' => 'Ι͂',
129 'ῗ' => 'Ϊ͂',
130 'ῢ' => 'Ϋ̀',
131 'ΰ' => 'Ϋ́',
132 'ῤ' => 'Ρ̓',
133 'ῦ' => 'Υ͂',
134 'ῧ' => 'Ϋ͂',
135 'ῲ' => 'ῺΙ',
136 'ῳ' => 'ΩΙ',
137 'ῴ' => 'ΏΙ',
138 'ῶ' => 'Ω͂',
139 'ῷ' => 'Ω͂Ι',
140 'ῼ' => 'ΩΙ',
141 'ⅰ' => 'Ⅰ',
142 'ⅱ' => 'Ⅱ',
143 'ⅲ' => 'Ⅲ',
144 'ⅳ' => 'Ⅳ',
145 'ⅴ' => 'Ⅴ',
146 'ⅵ' => 'Ⅵ',
147 'ⅶ' => 'Ⅶ',
148 'ⅷ' => 'Ⅷ',
149 'ⅸ' => 'Ⅸ',
150 'ⅹ' => 'Ⅹ',
151 'ⅺ' => 'Ⅺ',
152 'ⅻ' => 'Ⅻ',
153 'ⅼ' => 'Ⅼ',
154 'ⅽ' => 'Ⅽ',
155 'ⅾ' => 'Ⅾ',
156 'ⅿ' => 'Ⅿ',
157 'ⓐ' => 'Ⓐ',
158 'ⓑ' => 'Ⓑ',
159 'ⓒ' => 'Ⓒ',
160 'ⓓ' => 'Ⓓ',
161 'ⓔ' => 'Ⓔ',
162 'ⓕ' => 'Ⓕ',
163 'ⓖ' => 'Ⓖ',
164 'ⓗ' => 'Ⓗ',
165 'ⓘ' => 'Ⓘ',
166 'ⓙ' => 'Ⓙ',
167 'ⓚ' => 'Ⓚ',
168 'ⓛ' => 'Ⓛ',
169 'ⓜ' => 'Ⓜ',
170 'ⓝ' => 'Ⓝ',
171 'ⓞ' => 'Ⓞ',
172 'ⓟ' => 'Ⓟ',
173 'ⓠ' => 'Ⓠ',
174 'ⓡ' => 'Ⓡ',
175 'ⓢ' => 'Ⓢ',
176 'ⓣ' => 'Ⓣ',
177 'ⓤ' => 'Ⓤ',
178 'ⓥ' => 'Ⓥ',
179 'ⓦ' => 'Ⓦ',
180 'ⓧ' => 'Ⓧ',
181 'ⓨ' => 'Ⓨ',
182 'ⓩ' => 'Ⓩ',
183 'ꞹ' => 'Ꞹ',
184 'ff' => 'FF',
185 'fi' => 'FI',
186 'fl' => 'FL',
187 'ffi' => 'FFI',
188 'ffl' => 'FFL',
189 'ſt' => 'ST',
190 'st' => 'ST',
191 'ﬓ' => 'ՄՆ',
192 'ﬔ' => 'ՄԵ',
193 'ﬕ' => 'ՄԻ',
194 'ﬖ' => 'ՎՆ',
195 'ﬗ' => 'ՄԽ',
196 '𖹠' => '𖹀',
197 '𖹡' => '𖹁',
198 '𖹢' => '𖹂',
199 '𖹣' => '𖹃',
200 '𖹤' => '𖹄',
201 '𖹥' => '𖹅',
202 '𖹦' => '𖹆',
203 '𖹧' => '𖹇',
204 '𖹨' => '𖹈',
205 '𖹩' => '𖹉',
206 '𖹪' => '𖹊',
207 '𖹫' => '𖹋',
208 '𖹬' => '𖹌',
209 '𖹭' => '𖹍',
210 '𖹮' => '𖹎',
211 '𖹯' => '𖹏',
212 '𖹰' => '𖹐',
213 '𖹱' => '𖹑',
214 '𖹲' => '𖹒',
215 '𖹳' => '𖹓',
216 '𖹴' => '𖹔',
217 '𖹵' => '𖹕',
218 '𖹶' => '𖹖',
219 '𖹷' => '𖹗',
220 '𖹸' => '𖹘',
221 '𖹹' => '𖹙',
222 '𖹺' => '𖹚',
223 '𖹻' => '𖹛',
224 '𖹼' => '𖹜',
225 '𖹽' => '𖹝',
226 '𖹾' => '𖹞',
227 '𖹿' => '𖹟',
228];

The process:

  1. Run foreachwiki uppercaseTitlesForUnicodeTransition.php --charmap ucfirst_overrides_7_2_7_3.php --userlist /tmp/user_renames.txt --suffix ' (former Unicode character)'
  2. Notify users who's usernames will be changed.
  3. Rename users: mwscript extensions/WikimediaMaintenance/renameInvalidUsernames.php --wiki meta wiki --list /tmp/user_renames.txt
  4. Wait awhile for global renames to take effect
  5. Rerun the uppercaseTitlesForUnicodeTransition.php script with --run option.

This will rename all users and titles for which the uppercasing of the first letter of the title will break with PHP upgrade to the uppercased version. This should be done relatively soon before the upgrade, because until the upgrade the lowercased version of the articles can still be created. After the PHP upgrade is complete, rerun the uppercaseTitlesForUnicodeTransition.php again to make sure any articles with wrong capitalization created between the first run and the upgrade are also migrated.

Event Timeline

I've just responded to a rename request arising from this.
It doesn't appear that any explanation was given to the user affected, just an edit summary left at the site of his former username.
This is poor handling of the users affected.

If, as stated, the former userpage will become inaccessible, the users affected won't even have that poor level of explanation.

This particular series of renames has not been done yet and will not be done for awhile.

The one that's being done is related to T219279. In an unfortunate series of events the user notifications were sent some time ago, and the actual renames only happened now. Within a few hours the affected usernames will start getting capitalized automatically, so this will stop being user-visible.

To make sure such a surprise not happen again, I've edited the task description to indicate that users who will be renamed need to be notified before renames commence.

Joe triaged this task as High priority.Mon, Jun 27, 8:55 AM

I think it's reasonable to run the rename only once we've fully migrated to php 7.4. We will anyways need to install the conversion map so that those pages/usernames are still reachable before we actually start sending traffic to php 7.4

Also I kind of remember we also need a similar conversion table for js -> php for VE, @Esanders can you confirm the details maybe?

No just VE, but mw.Title.js in general has a phpCharToUpper method which is generated by a maintenance script (GeneratePhpCharToUpperMappings) and outputs the result of Language::ucfirst.