Page MenuHomePhabricator

Run cleanupTitles.php across Wikimedia wikis
Closed, ResolvedPublic

Description

As a follow-up to bug 22939, I'm seeing some strange behavior on the English Wikipedia:

MariaDB [enwiki_p]> select * from page where page_namespace = 2 and page_title = 'Ɑʀʇʉʀɵ/SmallCaps.charset'\G

  • 1. row ******* page_id: 40422349 page_namespace: 2 page_title: Ɑʀʇʉʀɵ/SmallCaps.charset

page_restrictions:

page_counter: 0

page_is_redirect: 0

 page_is_new: 1
 page_random: 0.582265889095
page_touched: 20130902000030
 page_latest: 571150285
    page_len: 1107

1 row in set (0.09 sec)

MariaDB [enwiki_p]> select * from page where page_namespace = 2 and page_title = 'ɑʀʇʉʀɵ/SmallCaps.charset'\G

  • 1. row ******* page_id: 8610689 page_namespace: 2 page_title: ɑʀʇʉʀɵ/SmallCaps.charset

page_restrictions:

page_counter: 0

page_is_redirect: 0

 page_is_new: 0
 page_random: 0.6887380353870001
page_touched: 20061226044311
 page_latest: 96503668
    page_len: 1107

1 row in set (0.10 sec)

"ɑʀʇʉʀɵ/SmallCaps.charset" is an inaccessible page title. It gets normalized to "Ɑʀʇʉʀɵ/SmallCaps.charset". Presumably the previous cleanupTitles.php run would have caught this, so... I'm not sure what's up.


Version: wmf-deployment
Severity: normal
URL: https://bugs.php.net/bug.php?id=52981

Details

Reference
bz53670

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:12 AM
bzimport set Reference to bz53670.

The majuscule form of 'ɑ' is U+2C6D Ɑ latin capital letter alpha. It was added to the Unicode standard for version 5.1, released in 2008. Up until version 5.3.3, PHP was using Unicode tables based on version 3.2 of the standard, released in 2002. When we last ran cleanupTitles.php (May 2012), we were still on PHP 5.3.2, which did not include the update.

See https://bugs.php.net/bug.php?id=52981 for more details.

We should re-run cleanupTitles.php.

Just for the record, these pages should now exist under "Broken/". The relevant results were:

$ grep "rows updated" 53670.log | grep -v "page... 0 of "
arwiki: Finished page... 1 of 1416284 rows updated
bewikisource: Finished page... 57 of 5972 rows updated
bgwiki: Finished page... 3 of 343456 rows updated
brwiki: Finished page... 3 of 96511 rows updated
bswiki: Finished page... 34 of 222943 rows updated
bxrwiki: Finished page... 2 of 4060 rows updated
cawiki: Finished page... 1 of 1012795 rows updated
cewiki: Finished page... 2 of 10910 rows updated
ckbwiki: Finished page... 5 of 69631 rows updated
commonswiki: Finished page... 4 of 25223432 rows updated
cswiki: Finished page... 3 of 706918 rows updated
cuwiki: Finished page... 1 of 4008 rows updated
cywikisource: Finished page... 19 of 1104 rows updated
dawiki: Finished page... 1 of 593711 rows updated
dewiki: Finished page... 41 of 4537363 rows updated
dewikivoyage: Finished page... 7 of 39145 rows updated
diqwiki: Finished page... 17 of 18456 rows updated
dvwiktionary: Finished page... 2 of 960 rows updated
elwiki: Finished page... 2 of 241860 rows updated
enwiki: Finished page... 157 of 31116095 rows updated
enwikinews: Finished page... 1 of 731634 rows updated
enwikisource: Finished page... 18 of 1447625 rows updated
eowiki: Finished page... 1 of 401302 rows updated
eowikisource: Finished page... 3 of 5680 rows updated
eowiktionary: Finished page... 1 of 38034 rows updated
eswiki: Finished page... 8 of 4325732 rows updated
etwiki: Finished page... 1 of 293643 rows updated
fawiki: Finished page... 3 of 1801439 rows updated
fiwiki: Finished page... 3 of 886219 rows updated
fiwikisource: Finished page... 310 of 12150 rows updated
frwiki: Finished page... 44 of 5975233 rows updated
frwikibooks: Finished page... 2 of 39266 rows updated
gdwiki: Finished page... 1 of 19077 rows updated
glwiki: Finished page... 1 of 230557 rows updated
guwiki: Finished page... 1 of 42638 rows updated
hewiki: Finished page... 1 of 629806 rows updated
hsbwiktionary: Finished page... 4 of 5331 rows updated
huwiki: Finished page... 3 of 835123 rows updated
hywiki: Finished page... 1 of 280600 rows updated
idwiki: Finished page... 3 of 1037501 rows updated
idwiktionary: Finished page... 2 of 194297 rows updated
incubatorwiki: Finished page... 1 of 563575 rows updated
itwiki: Finished page... 4 of 3444211 rows updated
jawiki: Finished page... 3 of 2418512 rows updated
kbdwiki: Finished page... 3 of 3095 rows updated
kowiki: Finished page... 2 of 809683 rows updated
kuwiki: Finished page... 4 of 47462 rows updated
kuwikibooks: Finished page... 2 of 531 rows updated
kuwikiquote: Finished page... 5 of 1050 rows updated
kywiki: Finished page... 1 of 36304 rows updated
lawiki: Finished page... 4 of 179872 rows updated
metawiki: Finished page... 3 of 2251097 rows updated
mhrwiki: Finished page... 1 of 12486 rows updated
minwiki: Finished page... 1 of 14535 rows updated
mlwikiquote: Finished page... 1 of 3176 rows updated
nowiki: Finished page... 10 of 943043 rows updated
plwiki: Finished page... 6 of 1957700 rows updated
ptwiki: Finished page... 5 of 3298730 rows updated
ruwiki: Finished page... 13 of 3495752 rows updated
sawikisource: Finished page... 1 of 11121 rows updated
skwiki: Finished page... 1 of 395761 rows updated
sourceswiki: Finished page... 1 of 37720 rows updated
srwiki: Finished page... 2 of 689781 rows updated
svwiki: Finished page... 3 of 3454807 rows updated
tawikisource: Finished page... 11 of 4720 rows updated
test2wiki: Finished page... 2 of 9689 rows updated
tewiktionary: Finished page... 2 of 100105 rows updated
thwiki: Finished page... 2 of 430332 rows updated
trwiki: Finished page... 1 of 1085798 rows updated
ttwiki: Finished page... 1 of 101891 rows updated
ukwiki: Finished page... 6 of 1386835 rows updated
ukwikisource: Finished page... 1 of 11063 rows updated
urwiki: Finished page... 26 of 102498 rows updated
uzwiki: Finished page... 14 of 635677 rows updated
zh_yuewiki: Finished page... 1 of 78606 rows updated
zhwiki: Finished page... 12 of 3086902 rows updated
zhwikibooks: Finished page... 1 of 7431 rows updated

(In reply to comment #3)

Just for the record, these pages should now exist under "Broken/".

Note: pages retain their namespace. For example:

  • (0,'ӷ') to (0,'Broken/Ӷ')
  • (3,'ɑʀʇʉʀɵ') to (3,'Ɑʀʇʉʀɵ')

So the pages will exist under "Broken/", but it requires checking every namespace if you're using Special:PrefixIndex.