Page MenuHomePhabricator

Some pages will become completely unreachable after PHP7 update due to Unicode changes
Closed, ResolvedPublic

Description

As detailed in T141723#5057472, mb_strtoupper, which we use to normalise titles, changes slightly in PHP7 with the Unicode update. As a result certain titles will have their normalised forms changed, and therefore will be unreachable if nothing is changed,

for example https://en.wikipedia.org/w/index.php?title=%C7%85&redirect=no takes you to article ID 7074938 in PHP5 HHVM, but if you enable the PHP7 beta feature, it takes you to 7074928, and the old article is now completely inaccessible.

Here are the changes (removed lines means the right hand side is no longer the result of mb_strtoupper, added lines are where the right hand side is a new result of mb_strtoupper):

--- a/resources/src/mediawiki.Title/phpCharToUpper.js
+++ b/resources/src/mediawiki.Title/phpCharToUpper.js
@@ -6,15 +6,8 @@
 	var toUpperMapping = {
 		'ß': 'ß',
 		'ʼn': 'ʼn',
-		'Dž': 'Dž',
-		'dž': 'Dž',
-		'Lj': 'Lj',
-		'lj': 'Lj',
-		'Nj': 'Nj',
-		'nj': 'Nj',
 		'ǰ': 'ǰ',
-		'Dz': 'Dz',
-		'dz': 'Dz',
+		'ɪ': 'Ɪ',
 		'ʝ': 'Ʝ',
 		'ͅ': 'ͅ',
 		'ΐ': 'ΐ',
@@ -26,6 +19,15 @@
 		'ᏻ': 'Ᏻ',
 		'ᏼ': 'Ᏼ',
 		'ᏽ': 'Ᏽ',
+		'ᲀ': 'В',
+		'ᲁ': 'Д',
+		'ᲂ': 'О',
+		'ᲃ': 'С',
+		'ᲄ': 'Т',
+		'ᲅ': 'Т',
+		'ᲆ': 'Ъ',
+		'ᲇ': 'Ѣ',
+		'ᲈ': 'Ꙋ',
 		'ẖ': 'ẖ',
 		'ẗ': 'ẗ',
 		'ẘ': 'ẘ',

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2020-04-27T20:28:40Z] <hknust> holger@mwmaint1002 Restarting uppercaseTitlesForUnicodeTransition.php as part of T219279 for 2 wikis

Mentioned in SAL (#wikimedia-operations) [2020-04-27T20:55:56Z] <hknust> holger@mwmaint1002 END (enwiki=success, frwiki=fail) uppercaseTitlesForUnicodeTransition.php as part of T219279

@tstarling enwiki worked. frwiki failed. I don't have permissions to view the 332 filter rule.

@tstarling enwiki worked. frwiki failed. I don't have permissions to view the 332 filter rule.

Hi, this frwiki filter is throttle for renames. I've made it temporary public and added exception for this account, it should not block the script again.

@tstarling enwiki worked. frwiki failed. I don't have permissions to view the 332 filter rule.

Hi, this frwiki filter is throttle for renames. I've made it temporary public and added exception for this account, it should not block the script again.

[unrelated] @Framawiki the condition & !('Page automatiquement déplacée lors du renommage de l’utilisateur' in summary) is probably no longer needed, since filters are skipped automatically when users are being renamed

Mentioned in SAL (#wikimedia-operations) [2020-04-28T13:30:58Z] <hknust> Restarting uppercaseTitlesForUnicodeTransition.php as part of T219279 for frwiki

Mentioned in SAL (#wikimedia-operations) [2020-05-01T13:06:27Z] <hknust> holger@mwmaint1002 Starting renameInvalidUsernames.php as part of T219279

Mentioned in SAL (#wikimedia-operations) [2020-05-01T14:18:13Z] <hknust> holger@mwmaint1002 finished renameInvalidUsernames.php (fail) as part of T219279

Where is this now? Did the maintenance script actually run?

@holger.knust : Could you please answer the last comment(s)? Thanks in advance!

Gentle nudge, this really needs to be completed.

@WDoranWMF @AMooney do you have a timeline for completion of this task?

Script execution failed with

holger@mwmaint1002:~/T219279_Unicode$ mwscript extensions/WikimediaMaintenance/renameInvalidUsernames.php loginwiki --list=userlist-2020-04-01.txt --reason="Unicode update. See T219279 for details" 
Reading from userlist-2020-04-01.txt
ɋ       Ɋ
Wikimedia\Rdbms\DBConnectionError from line 1419 of /srv/mediawiki/php-1.36.0-wmf.16/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Cannot access the database: Unknown error (10.64.48.35)
#0 /srv/mediawiki/php-1.36.0-wmf.16/includes/libs/rdbms/loadbalancer/LoadBalancer.php(932): Wikimedia\Rdbms\LoadBalancer->reportConnectionError()
#1 /srv/mediawiki/php-1.36.0-wmf.16/includes/libs/rdbms/loadbalancer/LoadBalancer.php(899): Wikimedia\Rdbms\LoadBalancer->getServerConnection(0, '\xC9\x8B', 4)
#2 /srv/mediawiki/php-1.36.0-wmf.16/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1045): Wikimedia\Rdbms\LoadBalancer->getConnection(-2, Array, '\xC9\x8B', 4)
#3 /srv/mediawiki/php-1.36.0-wmf.16/includes/GlobalFunctions.php(2460): Wikimedia\Rdbms\LoadBalancer->getMaintenanceConnectionRef(-2, Array, '\xC9\x8B')
#4 /srv/mediawiki/php-1.36.0-wmf.16/extensions/WikimediaMaintenance/renameInvalidUsernames.php(75): wfGetDB(-2, Array, '\xC9\x8B')
#5 /srv/mediawiki/php-1.36.0-wmf.16/extensions/WikimediaMaintenance/renameInvalidUsernames.php(50): RenameInvalidUsernames->rename('\xC9\x8A', '\xC9\x8B', NULL)
#6 /srv/mediawiki/php-1.36.0-wmf.16/maintenance/doMaintenance.php(106): RenameInvalidUsernames->execute()
#7 /srv/mediawiki/php-1.36.0-wmf.16/extensions/WikimediaMaintenance/renameInvalidUsernames.php(182): require_once('/srv/mediawiki/...')
#8 /srv/mediawiki/multiversion/MWScript.php(101): require_once('/srv/mediawiki/...')
#9 {main}

pc2010 seems to be lagging behind. This is a non-issue for production, given it is codfw, but noting it here because it may alert if the writing trends continue. This maintenance is the most likely explanation- as it started at the exact time the log indicates (but I am not 100% sure about it).

From looking at the code, it seems like the user list ought to have three fields, the first one being the name of the wiki. That appears to be missing. Someone can correct me on that later if they know better.

Was this ever finished? This has come up again on https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Should_I_have_been_able_to_create_these_lowercase_Cyrillic_redirects? where someone was able to create a lowercase page because the letter is still in wmf-config/Php72ToUpper.php.

No, I believe the final step post-upgrade hasn't yet been done.

Production configuration still runs with the "HHVM-like" shim applied, suggesting this migration indeed has not been completed.

wmf-config/CommonSettings.php
# Part of the HHVM => PHP7.2 migration. Adds an array of unicode chars
# that have broken uppercasing in HHVM. In this phase, we want php7 to behave
# like HHVM. See T219279 for details.
$wgOverrideUcfirstCharacters = include __DIR__ . '/Php72ToUpper.php';

Gosh this it's hard to parse what's going on here and the folks most involved are not working here anymore. I've tried to parse what exactly was and wasn't done from the logs, so I'll assume the the checkboxes in T219279#5362460 are more-or-less correct.

  • Determine the schedule to do these next steps.
  • Generate a list of affected titles, and announce the upcoming change on User-notice.
  • Run uppercaseTitlesForUnicodeTransition.php and renameInvalidUsernames.php on all wikis. My current plan would be to use --suffix ' (former Unicode lowercase)' for handling of collisions.
  • Reverse https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/505487 so HHVM uses PHP 7.2's mappings. Remove the override from https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/505487.
  • Run uppercaseTitlesForUnicodeTransition.php and renameInvalidUsernames.php on all wikis again, in case someone created new problem titles while the scripts were running the first time.

I'll first dry-run the uppercaseTitlesForUnicodeTransition.php, worst case schenario I just run the script that's already been run before.

Reedy added a subscriber: holger.knust.

I'll first dry-run the uppercaseTitlesForUnicodeTransition.php, worst case schenario I just run the script that's already been run before.

So, I've dry-run the uppercaseTitlesForUnicodeTransition.php with current Php72ToUpper.php overrides map in production and it reports nothing to rename, which means someone has probably already done it. Unfortunately, I think that's actually not what needs to be done here.

Currently, we're running php7.2 in production and we have strtoupper overrides enabled, forcing it to behave just like HHVM ( for example, ƀ is still uppercased as ƀ ). When we remove the overrides, we will start using native uppercasing ( for example ƀ will be uppercased as Ƀ ). So if there are any pages that start with HHVM-not-uppercased letters, these pages need to be renamed to uppercased letters before we can remove the overrides.

I've generated a new 'fake' overrides map, from current MW uppercasing to PHP 7.2 native uppercasing here

1<?php
2// File created by generateUcfirstOverrides.php
3return [
4 'ƀ' => 'Ƀ',
5 'ƚ' => 'Ƚ',
6 'Dž' => 'DŽ',
7 'dž' => 'DŽ',
8 'Lj' => 'LJ',
9 'lj' => 'LJ',
10 'Nj' => 'NJ',
11 'nj' => 'NJ',
12 'Dz' => 'DZ',
13 'dz' => 'DZ',
14 'ȼ' => 'Ȼ',
15 'ȿ' => 'Ȿ',
16 'ɀ' => 'Ɀ',
17 'ɂ' => 'Ɂ',
18 'ɇ' => 'Ɇ',
19 'ɉ' => 'Ɉ',
20 'ɋ' => 'Ɋ',
21 'ɍ' => 'Ɍ',
22 'ɏ' => 'Ɏ',
23 'ɐ' => 'Ɐ',
24 'ɑ' => 'Ɑ',
25 'ɒ' => 'Ɒ',
26 'ɜ' => 'Ɜ',
27 'ɡ' => 'Ɡ',
28 'ɥ' => 'Ɥ',
29 'ɦ' => 'Ɦ',
30 'ɪ' => 'Ɪ',
31 'ɫ' => 'Ɫ',
32 'ɬ' => 'Ɬ',
33 'ɱ' => 'Ɱ',
34 'ɽ' => 'Ɽ',
35 'ʇ' => 'Ʇ',
36 'ʉ' => 'Ʉ',
37 'ʌ' => 'Ʌ',
38 'ʝ' => 'Ʝ',
39 'ʞ' => 'Ʞ',
40 'ͱ' => 'Ͱ',
41 'ͳ' => 'Ͳ',
42 'ͷ' => 'Ͷ',
43 'ͻ' => 'Ͻ',
44 'ͼ' => 'Ͼ',
45 'ͽ' => 'Ͽ',
46 'ϗ' => 'Ϗ',
47 'ϲ' => 'Ϲ',
48 'ϳ' => 'Ϳ',
49 'ϸ' => 'Ϸ',
50 'ϻ' => 'Ϻ',
51 'ӏ' => 'Ӏ',
52 'ӷ' => 'Ӷ',
53 'ӻ' => 'Ӻ',
54 'ӽ' => 'Ӽ',
55 'ӿ' => 'Ӿ',
56 'ԑ' => 'Ԑ',
57 'ԓ' => 'Ԓ',
58 'ԕ' => 'Ԕ',
59 'ԗ' => 'Ԗ',
60 'ԙ' => 'Ԙ',
61 'ԛ' => 'Ԛ',
62 'ԝ' => 'Ԝ',
63 'ԟ' => 'Ԟ',
64 'ԡ' => 'Ԡ',
65 'ԣ' => 'Ԣ',
66 'ԥ' => 'Ԥ',
67 'ԧ' => 'Ԧ',
68 'ԩ' => 'Ԩ',
69 'ԫ' => 'Ԫ',
70 'ԭ' => 'Ԭ',
71 'ԯ' => 'Ԯ',
72 'ᏸ' => 'Ᏸ',
73 'ᏹ' => 'Ᏹ',
74 'ᏺ' => 'Ᏺ',
75 'ᏻ' => 'Ᏻ',
76 'ᏼ' => 'Ᏼ',
77 'ᏽ' => 'Ᏽ',
78 'ᲀ' => 'В',
79 'ᲁ' => 'Д',
80 'ᲂ' => 'О',
81 'ᲃ' => 'С',
82 'ᲄ' => 'Т',
83 'ᲅ' => 'Т',
84 'ᲆ' => 'Ъ',
85 'ᲇ' => 'Ѣ',
86 'ᲈ' => 'Ꙋ',
87 'ᵹ' => 'Ᵹ',
88 'ᵽ' => 'Ᵽ',
89 'ỻ' => 'Ỻ',
90 'ỽ' => 'Ỽ',
91 'ỿ' => 'Ỿ',
92 'ⅎ' => 'Ⅎ',
93 'ↄ' => 'Ↄ',
94 'ⰰ' => 'Ⰰ',
95 'ⰱ' => 'Ⰱ',
96 'ⰲ' => 'Ⰲ',
97 'ⰳ' => 'Ⰳ',
98 'ⰴ' => 'Ⰴ',
99 'ⰵ' => 'Ⰵ',
100 'ⰶ' => 'Ⰶ',
101 'ⰷ' => 'Ⰷ',
102 'ⰸ' => 'Ⰸ',
103 'ⰹ' => 'Ⰹ',
104 'ⰺ' => 'Ⰺ',
105 'ⰻ' => 'Ⰻ',
106 'ⰼ' => 'Ⰼ',
107 'ⰽ' => 'Ⰽ',
108 'ⰾ' => 'Ⰾ',
109 'ⰿ' => 'Ⰿ',
110 'ⱀ' => 'Ⱀ',
111 'ⱁ' => 'Ⱁ',
112 'ⱂ' => 'Ⱂ',
113 'ⱃ' => 'Ⱃ',
114 'ⱄ' => 'Ⱄ',
115 'ⱅ' => 'Ⱅ',
116 'ⱆ' => 'Ⱆ',
117 'ⱇ' => 'Ⱇ',
118 'ⱈ' => 'Ⱈ',
119 'ⱉ' => 'Ⱉ',
120 'ⱊ' => 'Ⱊ',
121 'ⱋ' => 'Ⱋ',
122 'ⱌ' => 'Ⱌ',
123 'ⱍ' => 'Ⱍ',
124 'ⱎ' => 'Ⱎ',
125 'ⱏ' => 'Ⱏ',
126 'ⱐ' => 'Ⱐ',
127 'ⱑ' => 'Ⱑ',
128 'ⱒ' => 'Ⱒ',
129 'ⱓ' => 'Ⱓ',
130 'ⱔ' => 'Ⱔ',
131 'ⱕ' => 'Ⱕ',
132 'ⱖ' => 'Ⱖ',
133 'ⱗ' => 'Ⱗ',
134 'ⱘ' => 'Ⱘ',
135 'ⱙ' => 'Ⱙ',
136 'ⱚ' => 'Ⱚ',
137 'ⱛ' => 'Ⱛ',
138 'ⱜ' => 'Ⱜ',
139 'ⱝ' => 'Ⱝ',
140 'ⱞ' => 'Ⱞ',
141 'ⱡ' => 'Ⱡ',
142 'ⱥ' => 'Ⱥ',
143 'ⱦ' => 'Ⱦ',
144 'ⱨ' => 'Ⱨ',
145 'ⱪ' => 'Ⱪ',
146 'ⱬ' => 'Ⱬ',
147 'ⱳ' => 'Ⱳ',
148 'ⱶ' => 'Ⱶ',
149 'ⲁ' => 'Ⲁ',
150 'ⲃ' => 'Ⲃ',
151 'ⲅ' => 'Ⲅ',
152 'ⲇ' => 'Ⲇ',
153 'ⲉ' => 'Ⲉ',
154 'ⲋ' => 'Ⲋ',
155 'ⲍ' => 'Ⲍ',
156 'ⲏ' => 'Ⲏ',
157 'ⲑ' => 'Ⲑ',
158 'ⲓ' => 'Ⲓ',
159 'ⲕ' => 'Ⲕ',
160 'ⲗ' => 'Ⲗ',
161 'ⲙ' => 'Ⲙ',
162 'ⲛ' => 'Ⲛ',
163 'ⲝ' => 'Ⲝ',
164 'ⲟ' => 'Ⲟ',
165 'ⲡ' => 'Ⲡ',
166 'ⲣ' => 'Ⲣ',
167 'ⲥ' => 'Ⲥ',
168 'ⲧ' => 'Ⲧ',
169 'ⲩ' => 'Ⲩ',
170 'ⲫ' => 'Ⲫ',
171 'ⲭ' => 'Ⲭ',
172 'ⲯ' => 'Ⲯ',
173 'ⲱ' => 'Ⲱ',
174 'ⲳ' => 'Ⲳ',
175 'ⲵ' => 'Ⲵ',
176 'ⲷ' => 'Ⲷ',
177 'ⲹ' => 'Ⲹ',
178 'ⲻ' => 'Ⲻ',
179 'ⲽ' => 'Ⲽ',
180 'ⲿ' => 'Ⲿ',
181 'ⳁ' => 'Ⳁ',
182 'ⳃ' => 'Ⳃ',
183 'ⳅ' => 'Ⳅ',
184 'ⳇ' => 'Ⳇ',
185 'ⳉ' => 'Ⳉ',
186 'ⳋ' => 'Ⳋ',
187 'ⳍ' => 'Ⳍ',
188 'ⳏ' => 'Ⳏ',
189 'ⳑ' => 'Ⳑ',
190 'ⳓ' => 'Ⳓ',
191 'ⳕ' => 'Ⳕ',
192 'ⳗ' => 'Ⳗ',
193 'ⳙ' => 'Ⳙ',
194 'ⳛ' => 'Ⳛ',
195 'ⳝ' => 'Ⳝ',
196 'ⳟ' => 'Ⳟ',
197 'ⳡ' => 'Ⳡ',
198 'ⳣ' => 'Ⳣ',
199 'ⳬ' => 'Ⳬ',
200 'ⳮ' => 'Ⳮ',
201 'ⳳ' => 'Ⳳ',
202 'ⴀ' => 'Ⴀ',
203 'ⴁ' => 'Ⴁ',
204 'ⴂ' => 'Ⴂ',
205 'ⴃ' => 'Ⴃ',
206 'ⴄ' => 'Ⴄ',
207 'ⴅ' => 'Ⴅ',
208 'ⴆ' => 'Ⴆ',
209 'ⴇ' => 'Ⴇ',
210 'ⴈ' => 'Ⴈ',
211 'ⴉ' => 'Ⴉ',
212 'ⴊ' => 'Ⴊ',
213 'ⴋ' => 'Ⴋ',
214 'ⴌ' => 'Ⴌ',
215 'ⴍ' => 'Ⴍ',
216 'ⴎ' => 'Ⴎ',
217 'ⴏ' => 'Ⴏ',
218 'ⴐ' => 'Ⴐ',
219 'ⴑ' => 'Ⴑ',
220 'ⴒ' => 'Ⴒ',
221 'ⴓ' => 'Ⴓ',
222 'ⴔ' => 'Ⴔ',
223 'ⴕ' => 'Ⴕ',
224 'ⴖ' => 'Ⴖ',
225 'ⴗ' => 'Ⴗ',
226 'ⴘ' => 'Ⴘ',
227 'ⴙ' => 'Ⴙ',
228 'ⴚ' => 'Ⴚ',
229 'ⴛ' => 'Ⴛ',
230 'ⴜ' => 'Ⴜ',
231 'ⴝ' => 'Ⴝ',
232 'ⴞ' => 'Ⴞ',
233 'ⴟ' => 'Ⴟ',
234 'ⴠ' => 'Ⴠ',
235 'ⴡ' => 'Ⴡ',
236 'ⴢ' => 'Ⴢ',
237 'ⴣ' => 'Ⴣ',
238 'ⴤ' => 'Ⴤ',
239 'ⴥ' => 'Ⴥ',
240 'ⴧ' => 'Ⴧ',
241 'ⴭ' => 'Ⴭ',
242 'ꙁ' => 'Ꙁ',
243 'ꙃ' => 'Ꙃ',
244 'ꙅ' => 'Ꙅ',
245 'ꙇ' => 'Ꙇ',
246 'ꙉ' => 'Ꙉ',
247 'ꙋ' => 'Ꙋ',
248 'ꙍ' => 'Ꙍ',
249 'ꙏ' => 'Ꙏ',
250 'ꙑ' => 'Ꙑ',
251 'ꙓ' => 'Ꙓ',
252 'ꙕ' => 'Ꙕ',
253 'ꙗ' => 'Ꙗ',
254 'ꙙ' => 'Ꙙ',
255 'ꙛ' => 'Ꙛ',
256 'ꙝ' => 'Ꙝ',
257 'ꙟ' => 'Ꙟ',
258 'ꙡ' => 'Ꙡ',
259 'ꙣ' => 'Ꙣ',
260 'ꙥ' => 'Ꙥ',
261 'ꙧ' => 'Ꙧ',
262 'ꙩ' => 'Ꙩ',
263 'ꙫ' => 'Ꙫ',
264 'ꙭ' => 'Ꙭ',
265 'ꚁ' => 'Ꚁ',
266 'ꚃ' => 'Ꚃ',
267 'ꚅ' => 'Ꚅ',
268 'ꚇ' => 'Ꚇ',
269 'ꚉ' => 'Ꚉ',
270 'ꚋ' => 'Ꚋ',
271 'ꚍ' => 'Ꚍ',
272 'ꚏ' => 'Ꚏ',
273 'ꚑ' => 'Ꚑ',
274 'ꚓ' => 'Ꚓ',
275 'ꚕ' => 'Ꚕ',
276 'ꚗ' => 'Ꚗ',
277 'ꚙ' => 'Ꚙ',
278 'ꚛ' => 'Ꚛ',
279 'ꜣ' => 'Ꜣ',
280 'ꜥ' => 'Ꜥ',
281 'ꜧ' => 'Ꜧ',
282 'ꜩ' => 'Ꜩ',
283 'ꜫ' => 'Ꜫ',
284 'ꜭ' => 'Ꜭ',
285 'ꜯ' => 'Ꜯ',
286 'ꜳ' => 'Ꜳ',
287 'ꜵ' => 'Ꜵ',
288 'ꜷ' => 'Ꜷ',
289 'ꜹ' => 'Ꜹ',
290 'ꜻ' => 'Ꜻ',
291 'ꜽ' => 'Ꜽ',
292 'ꜿ' => 'Ꜿ',
293 'ꝁ' => 'Ꝁ',
294 'ꝃ' => 'Ꝃ',
295 'ꝅ' => 'Ꝅ',
296 'ꝇ' => 'Ꝇ',
297 'ꝉ' => 'Ꝉ',
298 'ꝋ' => 'Ꝋ',
299 'ꝍ' => 'Ꝍ',
300 'ꝏ' => 'Ꝏ',
301 'ꝑ' => 'Ꝑ',
302 'ꝓ' => 'Ꝓ',
303 'ꝕ' => 'Ꝕ',
304 'ꝗ' => 'Ꝗ',
305 'ꝙ' => 'Ꝙ',
306 'ꝛ' => 'Ꝛ',
307 'ꝝ' => 'Ꝝ',
308 'ꝟ' => 'Ꝟ',
309 'ꝡ' => 'Ꝡ',
310 'ꝣ' => 'Ꝣ',
311 'ꝥ' => 'Ꝥ',
312 'ꝧ' => 'Ꝧ',
313 'ꝩ' => 'Ꝩ',
314 'ꝫ' => 'Ꝫ',
315 'ꝭ' => 'Ꝭ',
316 'ꝯ' => 'Ꝯ',
317 'ꝺ' => 'Ꝺ',
318 'ꝼ' => 'Ꝼ',
319 'ꝿ' => 'Ꝿ',
320 'ꞁ' => 'Ꞁ',
321 'ꞃ' => 'Ꞃ',
322 'ꞅ' => 'Ꞅ',
323 'ꞇ' => 'Ꞇ',
324 'ꞌ' => 'Ꞌ',
325 'ꞑ' => 'Ꞑ',
326 'ꞓ' => 'Ꞓ',
327 'ꞗ' => 'Ꞗ',
328 'ꞙ' => 'Ꞙ',
329 'ꞛ' => 'Ꞛ',
330 'ꞝ' => 'Ꞝ',
331 'ꞟ' => 'Ꞟ',
332 'ꞡ' => 'Ꞡ',
333 'ꞣ' => 'Ꞣ',
334 'ꞥ' => 'Ꞥ',
335 'ꞧ' => 'Ꞧ',
336 'ꞩ' => 'Ꞩ',
337 'ꞵ' => 'Ꞵ',
338 'ꞷ' => 'Ꞷ',
339 'ꭓ' => 'Ꭓ',
340 'ꭰ' => 'Ꭰ',
341 'ꭱ' => 'Ꭱ',
342 'ꭲ' => 'Ꭲ',
343 'ꭳ' => 'Ꭳ',
344 'ꭴ' => 'Ꭴ',
345 'ꭵ' => 'Ꭵ',
346 'ꭶ' => 'Ꭶ',
347 'ꭷ' => 'Ꭷ',
348 'ꭸ' => 'Ꭸ',
349 'ꭹ' => 'Ꭹ',
350 'ꭺ' => 'Ꭺ',
351 'ꭻ' => 'Ꭻ',
352 'ꭼ' => 'Ꭼ',
353 'ꭽ' => 'Ꭽ',
354 'ꭾ' => 'Ꭾ',
355 'ꭿ' => 'Ꭿ',
356 'ꮀ' => 'Ꮀ',
357 'ꮁ' => 'Ꮁ',
358 'ꮂ' => 'Ꮂ',
359 'ꮃ' => 'Ꮃ',
360 'ꮄ' => 'Ꮄ',
361 'ꮅ' => 'Ꮅ',
362 'ꮆ' => 'Ꮆ',
363 'ꮇ' => 'Ꮇ',
364 'ꮈ' => 'Ꮈ',
365 'ꮉ' => 'Ꮉ',
366 'ꮊ' => 'Ꮊ',
367 'ꮋ' => 'Ꮋ',
368 'ꮌ' => 'Ꮌ',
369 'ꮍ' => 'Ꮍ',
370 'ꮎ' => 'Ꮎ',
371 'ꮏ' => 'Ꮏ',
372 'ꮐ' => 'Ꮐ',
373 'ꮑ' => 'Ꮑ',
374 'ꮒ' => 'Ꮒ',
375 'ꮓ' => 'Ꮓ',
376 'ꮔ' => 'Ꮔ',
377 'ꮕ' => 'Ꮕ',
378 'ꮖ' => 'Ꮖ',
379 'ꮗ' => 'Ꮗ',
380 'ꮘ' => 'Ꮘ',
381 'ꮙ' => 'Ꮙ',
382 'ꮚ' => 'Ꮚ',
383 'ꮛ' => 'Ꮛ',
384 'ꮜ' => 'Ꮜ',
385 'ꮝ' => 'Ꮝ',
386 'ꮞ' => 'Ꮞ',
387 'ꮟ' => 'Ꮟ',
388 'ꮠ' => 'Ꮠ',
389 'ꮡ' => 'Ꮡ',
390 'ꮢ' => 'Ꮢ',
391 'ꮣ' => 'Ꮣ',
392 'ꮤ' => 'Ꮤ',
393 'ꮥ' => 'Ꮥ',
394 'ꮦ' => 'Ꮦ',
395 'ꮧ' => 'Ꮧ',
396 'ꮨ' => 'Ꮨ',
397 'ꮩ' => 'Ꮩ',
398 'ꮪ' => 'Ꮪ',
399 'ꮫ' => 'Ꮫ',
400 'ꮬ' => 'Ꮬ',
401 'ꮭ' => 'Ꮭ',
402 'ꮮ' => 'Ꮮ',
403 'ꮯ' => 'Ꮯ',
404 'ꮰ' => 'Ꮰ',
405 'ꮱ' => 'Ꮱ',
406 'ꮲ' => 'Ꮲ',
407 'ꮳ' => 'Ꮳ',
408 'ꮴ' => 'Ꮴ',
409 'ꮵ' => 'Ꮵ',
410 'ꮶ' => 'Ꮶ',
411 'ꮷ' => 'Ꮷ',
412 'ꮸ' => 'Ꮸ',
413 'ꮹ' => 'Ꮹ',
414 'ꮺ' => 'Ꮺ',
415 'ꮻ' => 'Ꮻ',
416 'ꮼ' => 'Ꮼ',
417 'ꮽ' => 'Ꮽ',
418 'ꮾ' => 'Ꮾ',
419 'ꮿ' => 'Ꮿ',
420 '𐑎' => '𐐦',
421 '𐑏' => '𐐧',
422 '𐓘' => '𐒰',
423 '𐓙' => '𐒱',
424 '𐓚' => '𐒲',
425 '𐓛' => '𐒳',
426 '𐓜' => '𐒴',
427 '𐓝' => '𐒵',
428 '𐓞' => '𐒶',
429 '𐓟' => '𐒷',
430 '𐓠' => '𐒸',
431 '𐓡' => '𐒹',
432 '𐓢' => '𐒺',
433 '𐓣' => '𐒻',
434 '𐓤' => '𐒼',
435 '𐓥' => '𐒽',
436 '𐓦' => '𐒾',
437 '𐓧' => '𐒿',
438 '𐓨' => '𐓀',
439 '𐓩' => '𐓁',
440 '𐓪' => '𐓂',
441 '𐓫' => '𐓃',
442 '𐓬' => '𐓄',
443 '𐓭' => '𐓅',
444 '𐓮' => '𐓆',
445 '𐓯' => '𐓇',
446 '𐓰' => '𐓈',
447 '𐓱' => '𐓉',
448 '𐓲' => '𐓊',
449 '𐓳' => '𐓋',
450 '𐓴' => '𐓌',
451 '𐓵' => '𐓍',
452 '𐓶' => '𐓎',
453 '𐓷' => '𐓏',
454 '𐓸' => '𐓐',
455 '𐓹' => '𐓑',
456 '𐓺' => '𐓒',
457 '𐓻' => '𐓓',
458 '𐳀' => '𐲀',
459 '𐳁' => '𐲁',
460 '𐳂' => '𐲂',
461 '𐳃' => '𐲃',
462 '𐳄' => '𐲄',
463 '𐳅' => '𐲅',
464 '𐳆' => '𐲆',
465 '𐳇' => '𐲇',
466 '𐳈' => '𐲈',
467 '𐳉' => '𐲉',
468 '𐳊' => '𐲊',
469 '𐳋' => '𐲋',
470 '𐳌' => '𐲌',
471 '𐳍' => '𐲍',
472 '𐳎' => '𐲎',
473 '𐳏' => '𐲏',
474 '𐳐' => '𐲐',
475 '𐳑' => '𐲑',
476 '𐳒' => '𐲒',
477 '𐳓' => '𐲓',
478 '𐳔' => '𐲔',
479 '𐳕' => '𐲕',
480 '𐳖' => '𐲖',
481 '𐳗' => '𐲗',
482 '𐳘' => '𐲘',
483 '𐳙' => '𐲙',
484 '𐳚' => '𐲚',
485 '𐳛' => '𐲛',
486 '𐳜' => '𐲜',
487 '𐳝' => '𐲝',
488 '𐳞' => '𐲞',
489 '𐳟' => '𐲟',
490 '𐳠' => '𐲠',
491 '𐳡' => '𐲡',
492 '𐳢' => '𐲢',
493 '𐳣' => '𐲣',
494 '𐳤' => '𐲤',
495 '𐳥' => '𐲥',
496 '𐳦' => '𐲦',
497 '𐳧' => '𐲧',
498 '𐳨' => '𐲨',
499 '𐳩' => '𐲩',
500 '𐳪' => '𐲪',
501 '𐳫' => '𐲫',
502 '𐳬' => '𐲬',
503 '𐳭' => '𐲭',
504 '𐳮' => '𐲮',
505 '𐳯' => '𐲯',
506 '𐳰' => '𐲰',
507 '𐳱' => '𐲱',
508 '𐳲' => '𐲲',
509 '𑣀' => '𑢠',
510 '𑣁' => '𑢡',
511 '𑣂' => '𑢢',
512 '𑣃' => '𑢣',
513 '𑣄' => '𑢤',
514 '𑣅' => '𑢥',
515 '𑣆' => '𑢦',
516 '𑣇' => '𑢧',
517 '𑣈' => '𑢨',
518 '𑣉' => '𑢩',
519 '𑣊' => '𑢪',
520 '𑣋' => '𑢫',
521 '𑣌' => '𑢬',
522 '𑣍' => '𑢭',
523 '𑣎' => '𑢮',
524 '𑣏' => '𑢯',
525 '𑣐' => '𑢰',
526 '𑣑' => '𑢱',
527 '𑣒' => '𑢲',
528 '𑣓' => '𑢳',
529 '𑣔' => '𑢴',
530 '𑣕' => '𑢵',
531 '𑣖' => '𑢶',
532 '𑣗' => '𑢷',
533 '𑣘' => '𑢸',
534 '𑣙' => '𑢹',
535 '𑣚' => '𑢺',
536 '𑣛' => '𑢻',
537 '𑣜' => '𑢼',
538 '𑣝' => '𑢽',
539 '𑣞' => '𑢾',
540 '𑣟' => '𑢿',
541 '𞤢' => '𞤀',
542 '𞤣' => '𞤁',
543 '𞤤' => '𞤂',
544 '𞤥' => '𞤃',
545 '𞤦' => '𞤄',
546 '𞤧' => '𞤅',
547 '𞤨' => '𞤆',
548 '𞤩' => '𞤇',
549 '𞤪' => '𞤈',
550 '𞤫' => '𞤉',
551 '𞤬' => '𞤊',
552 '𞤭' => '𞤋',
553 '𞤮' => '𞤌',
554 '𞤯' => '𞤍',
555 '𞤰' => '𞤎',
556 '𞤱' => '𞤏',
557 '𞤲' => '𞤐',
558 '𞤳' => '𞤑',
559 '𞤴' => '𞤒',
560 '𞤵' => '𞤓',
561 '𞤶' => '𞤔',
562 '𞤷' => '𞤕',
563 '𞤸' => '𞤖',
564 '𞤹' => '𞤗',
565 '𞤺' => '𞤘',
566 '𞤻' => '𞤙',
567 '𞤼' => '𞤚',
568 '𞤽' => '𞤛',
569 '𞤾' => '𞤜',
570 '𞤿' => '𞤝',
571 '𞥀' => '𞤞',
572 '𞥁' => '𞤟',
573 '𞥂' => '𞤠',
574 '𞥃' => '𞤡',
575];
and will dry-run the script with this character map.

So, first I've dry-run the script

foreachwiki uppercaseTitlesForUnicodeTransition.php --charmap current_to_php7_overrides.php --userlist /tmp/rename_users_for_uppercase.txt --suffix ' (former Unicode lowercase)' > results.txt

to find out there's 809 pages overall that need renaming and quite some usernames. First problem is that the uppercaseTitlesForUnicodeTransition.php script generates a list of usernames for us, but in a format that's not compatible with renameInvalidUsernames.php. Let's make them compatible first.

Change 726083 had a related patch set uploaded (by Ppchelko; author: Ppchelko):

[mediawiki/core@master] uppercaseTitlesForUnicodeTransition: improve userlist format

https://gerrit.wikimedia.org/r/726083

Change 726083 merged by jenkins-bot:

[mediawiki/core@master] uppercaseTitlesForUnicodeTransition: improve userlist format

https://gerrit.wikimedia.org/r/726083

Change 726589 had a related patch set uploaded (by Ppchelko; author: Ppchelko):

[mediawiki/core@wmf/1.38.0-wmf.2] uppercaseTitlesForUnicodeTransition: improve userlist format

https://gerrit.wikimedia.org/r/726589

Change 726589 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.2] uppercaseTitlesForUnicodeTransition: improve userlist format

https://gerrit.wikimedia.org/r/726589

Mentioned in SAL (#wikimedia-operations) [2021-10-05T13:23:54Z] <ppchelko@deploy1002> Synchronized php-1.38.0-wmf.2/maintenance/uppercaseTitlesForUnicodeTransition.php: Backport uppercaseTitlesForUnicodeTransition.php maintenance script improvements T219279 (duration: 00m 58s)

Change 726623 had a related patch set uploaded (by Ppchelko; author: Ppchelko):

[operations/mediawiki-config@master] Remove mb_strtoupper overrides for HHVM

https://gerrit.wikimedia.org/r/726623

Mentioned in SAL (#wikimedia-operations) [2021-10-05T13:46:27Z] <Pchelolo> run renameInvalidUsernames.php --wiki loginwiki --list /tmp/rename_users_for_uppercase_all.txt T219279

Renaming users completed. There's 4 accounts across all sites that we can't capitalize because the capitalized account exists. Neither of those 4 accounts appear to be active editors, nothing can be done here really, the 4 accounts will become unreachable.

Mentioned in SAL (#wikimedia-operations) [2021-10-05T14:30:07Z] <Pchelolo> run foreachwiki uppercaseTitlesForUnicodeTransition.php --charmap current_to_php7_overrides.php T219279

Script run finished. We've has some AbuseFilter violations:

If someone could be so kind to disable these filters, or exclude Maintenance script user from them, I'll rerun the renaming script for these wikis.

In any case it seems like the pages we've failed to rename are just redirects, so we can proceed with removing the overrides from production, e.g. https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/726623

I'll rerun the cleanup script again after it's deployed, just for good measure.

Renaming users completed. There's 4 accounts across all sites that we can't capitalize because the capitalized account exists. Neither of those 4 accounts appear to be active editors, nothing can be done here really, the 4 accounts will become unreachable.

Should they be renamed to something else, like was done with SUL finalization?

Should they be renamed to something else, like was done with SUL finalization?

Yeah, can do. Like '<capitalized letter>_T219279'?

Was just about to ask this. While rare, there are cases of accounts making their first edits years after creation. Maybe systematically rename them to something like <capitalized username> (technical rename), with usertalk notes pointing them to Special:GlobalRenameRequest?

Edit: Whoops, didn't get the notification for your post till after I'd posted this, Pchelolo.

with usertalk notes pointing them to Special:GlobalRenameRequest?

That's the issue, I can only leave a note on User_Talk:<capitalized username> (technical rename) page, not User_Talk:<unreachable_lowercase_username> since that user_talk will also be unreachable. And the capitalized version belongs to a different user. And without a notice if the owner of the account decides to come back, how would they know where to look for the note..

If someone could be so kind to disable these filters, or exclude Maintenance script user from them, I'll rerun the renaming script for these wikis.

This is a regression in AbuseFilter, see T212082#4826766. Years ago I had refactored MovePage to specifically allow exempting maintenance scripts and server-initiated page renames from AbuseFilters and other anti-spam checks.

with usertalk notes pointing them to Special:GlobalRenameRequest?

That's the issue, I can only leave a note on User_Talk:<capitalized username> (technical rename) page, not User_Talk:<unreachable_lowercase_username> since that user_talk will also be unreachable. And the capitalized version belongs to a different user. And without a notice if the owner of the account decides to come back, how would they know where to look for the note..

If you globally rename the users now, and leave the notes on their home-wiki talkpages, won't they still get the "You have new messages" banner (etc.), same as any other rename?

with usertalk notes pointing them to Special:GlobalRenameRequest?

That's the issue, I can only leave a note on User_Talk:<capitalized username> (technical rename) page, not User_Talk:<unreachable_lowercase_username> since that user_talk will also be unreachable. And the capitalized version belongs to a different user. And without a notice if the owner of the account decides to come back, how would they know where to look for the note..

If you globally rename the users now, and leave the notes on their home-wiki talkpages, won't they still get the "You have new messages" banner (etc.), same as any other rename?

Renaming users logs them out, and by default, no rename issues (email) notifications (web notifications are inaccessible -- you'd have to log in first, and for that, you have to know about a rename). Only approved queue renames do so (and AFAIK an email is mandatory for them). If I rename someone manually (for technical reasons, or because of a request on a community request page), they only notice it after they log in. Kinda easy to spot if you expect the rename, but very unlikely if you don't (and aren't particulary aware of how MW works).

In another words, there's very little chance to contact them in any way, I'm afraid.

Ah. That's a shame. You'd think after two renames I'd know that. Oh well. If anyone does try to awaken a dormant account, they'll probably just think they forgot the password.

Change 726623 merged by jenkins-bot:

[operations/mediawiki-config@master] Remove mb_strtoupper overrides for HHVM

https://gerrit.wikimedia.org/r/726623

Mentioned in SAL (#wikimedia-operations) [2021-10-05T18:04:24Z] <ppchelko@deploy1002> Synchronized wmf-config/CommonSettings.php: Remove mb_strtoupper overrides for HHVM T219279 CS.php (duration: 01m 06s)

Mentioned in SAL (#wikimedia-operations) [2021-10-05T18:11:06Z] <ppchelko@deploy1002> Synchronized wmf-config: Remove mb_strtoupper overrides for HHVM T219279 Php72ToUpper.php removal (duration: 01m 06s)

Ok, we're done with removal of overrides. Now production uppercasing is just regular PHP 7.2 uppercasing.

For example, https://test.wikipedia.org/wiki/ɋ is not https://test.wikipedia.org/wiki/Ɋ and so on.

Just to be completely sure nothing was left behind I'll run the script one more time in case some pages were re-created from the last script run, and we're done here.

Change 726672 had a related patch set uploaded (by Ppchelko; author: Ppchelko):

[mediawiki/core@master] Remove phpCharMappings that are no longer nesessary

https://gerrit.wikimedia.org/r/726672

Pchelolo claimed this task.

Ok, re-run the script. All good, no leftovers.

CDanis rescinded a token.
CDanis awarded a token.

Change 726672 merged by jenkins-bot:

[mediawiki/core@master] mediawiki.Title: Regenerate phpCharMappings against plain PHP 7.2

https://gerrit.wikimedia.org/r/726672

@Pchelolo Ping? Can you please review the renames that failed? See https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress for a list of them.

Ok, fixed all the stuck renames. Done.

This task (or possibly a prior similar one) was noticed at https://en.wikipedia.org/wiki/WP:VPT#Automatically_renamed_users , just check in to see whether the behavior of Special:Contribs is expected.

https://en.wikipedia.org/wiki/WP:VPT#Automatically_renamed_users

Hello! See User:Ʝ and User:DZoo — two users who seem to have disappeared during a rename from lowercase to capital letters. Their user pages' history show them being created by users with the lowercase name, but clicking their user link or contribs link brings you to the nonexistent username with the capital first character.

https://en.wikipedia.org/wiki/WP:VPT#Automatically_renamed_users

Hello! See User:Ʝ and User:DZoo — two users who seem to have disappeared during a rename from lowercase to capital letters. Their user pages' history show them being created by users with the lowercase name, but clicking their user link or contribs link brings you to the nonexistent username with the capital first character.

This is probably a problem with the user rename procedure.

What I see in the database is that the revision that created the page for user DZoo is tied to user Dzoo in the enwiki database, which seems to be still present in the user table (id 7026957)

(I was the one at the Village Pump.) It looks like the renames didn't entirely work — has this happened with other users?