Page MenuHomePhabricator

Run `php maintenance/updateCollation.php --force` on all Russian-language projects using uca-ru collation
Closed, ResolvedPublic1 Estimated Story Points

Description

Please run php maintenance/updateCollation.php --force on all Russian-language projects using uca-ru collation. For some reason the sortkeys generated before January 2015 and those generated later use a different format, resulting in incorrect ordering (see T88088: Incorrect sorting in categories on Russian-language projects).

  • ruwiki
  • ruwikibooks
  • ruwikinews
  • ruwikiquote
  • ruwikisource
  • ruwikiversity
  • ruwikivoyage
  • ruwiktionary

I think we're planning to upgrade our version of ICU soon, which will require running this script on these wikis anyway (T86096), so if that's happening within next few weeks, we could probably do it only once later.

Related Objects

Event Timeline

matmarex added subscribers: kaldari, faidon.

@kaldari Mentioning you since you folks at Community Tech are currently doing category collation stuff.

@faidon Do you know if T86096 is happening soon?

Wikiselect count(*) from categorylinks
ruwiki12100754
ruwikibooks11322
ruwikinews250614
ruwikiquote64954
ruwikisource2138959
ruwikiversity21140
ruwikivoyage15483
ruwiktionary9302371

I scheduled ruwikibooks, ruwikivoyage, ruwikiversity, ruwikinews, ruwikiquote script runs for today's evening SWAT (they should take less than 15 minutes collectively).

ruwiki, ruwikisource and ruwiktionary should probably wait for T58041 at least.

matmarex triaged this task as Medium priority.Mar 14 2016, 7:21 PM
[00:46] <MaxSem> !log ran mwscript maintenance/updateCollation.php --wiki=ruwikibooks --force
[00:51] <MaxSem> !log ran mwscript maintenance/updateCollation.php --wiki=ruwikivoyage --force
[00:52] <MaxSem> !log ran mwscript maintenance/updateCollation.php --wiki=ruwikiversity --force
[00:55] <MaxSem> !log ran mwscript maintenance/updateCollation.php --wiki=ruwikiquote --force
[01:01] <MaxSem> !log ran mwscript maintenance/updateCollation.php --wiki=ruwikinews --force

The new index is in place on ruwiktionary now. Let's try running the updateCollation.php script there and see how fast it goes.

kaldari set the point value for this task to 1.

@jcrespo, @matmarex: I'm currently running the updateCollation.php script for ruwiktionary, but it seems extremely slow (although it's supposed to have the new index). It's been running for 20 minutes and it's only processed 4200 rows out of 9+ million:

kaldari@terbium:/srv/mediawiki/php-1.27.0-wmf.22$ mwscript maintenance/updateCollation.php --wiki=ruwiktionary --force
...
Selecting next 100 rows... processing...4200 done.

At this rate it will take 30 days to finish running :( Any suggestions?

batch size should probably be set via a parameter

I stopped the script after 60,000 rows had been processed. It had slowed down even more and at the rate it was going it would have taken 66 days to finish. I guess for some reason our new index didn't actually help in this case.

This is going to be done very soon as part of T86096. According to T86096#2317611, the current plan is May 26th.

This was completed for the remaining wikis as part of T86096.