ICU 57 migration for wikis using non-default collation
Closed, ResolvedPublic

Description

I've temporarily upgraded mwdebug2001 to a HHVM build with a ICU 57 backport and ran the updateCollation script in dry run mode to check which wikis with any form of UCA collation are affected and it turns out almost all of them need the migration. In fact, of the ten biggest Wikipedias, only cebwiki and dewiki do not need migration.

Here's the rundown sorted by shards:

s1:

  • enwiki

s2:

  • cswiki
  • fiwiki
  • itwiki
  • nlwiki
  • nowiki
  • plwiki
  • ptwiki
  • svwiki
  • thwiki

s3:

  • be_x_oldwiki
  • bewiki
  • bewikisource
  • bswiki
  • ckbwiki
  • cswiktionary
  • cywiki
  • cywikibooks
  • cywikiquote
  • cywikisource
  • cywiktionary
  • dewikisource
  • eswikiversity
  • etwiki
  • etwikibooks
  • etwikimedia
  • etwikiquote
  • etwikisource
  • etwiktionary
  • fawikibooks
  • fawikinews
  • fawikiquote
  • fawikisource
  • fawiktionary
  • fiwikibooks
  • fiwikimedia
  • fiwikinews
  • fiwikiquote
  • fiwikisource
  • fiwikiversity
  • fiwikivoyage
  • frwikibooks
  • frwikinews
  • frwikiversity
  • gdwiki
  • glwiki
  • hrwiki
  • hsbwiki
  • ilowiki
  • iswiki
  • ltwiki
  • lvwiki
  • mediawikiwiki
  • mkwiki
  • nowikimedia
  • olowiki
  • plwikisource
  • plwikivoyage
  • plwiktionary
  • ptwikibooks
  • rowikibooks
  • rowikinews
  • rowikiquote
  • rowikisource
  • rowikivoyage
  • rowiktionary
  • rswikimedia
  • ruwikibooks
  • ruwikinews
  • ruwikiquote
  • ruwikisource
  • ruwikiversity
  • ruwikivoyage
  • ruwiktionary
  • shwiki
  • skwiki
  • srwiki
  • srwikibooks
  • srwikinews
  • srwikiquote
  • srwikisource
  • srwiktionary
  • svwikisource
  • tawiki
  • tawikibooks
  • tawikinews
  • tawikiquote
  • tawikisource
  • tawiktionary
  • testwiki
  • thwikibooks
  • thwikinews
  • thwikiquote
  • thwikisource
  • thwiktionary
  • uawikimedia
  • ukwikibooks
  • ukwikinews
  • ukwikiquote
  • ukwikisource
  • ukwikivoyage
  • ukwiktionary
  • viwikibooks
  • viwikiquote
  • viwikisource
  • viwikivoyage
  • viwiktionary

s6:

  • frwiki
  • ruwiki

s7:

  • eswiki
  • fawiki
  • frwiktionary
  • huwiki
  • rowiki
  • ukwiki
  • viwiki

The migration causes some unavoidable user-visible impact: The sorting of some category pages will be distorted; all pages which have been updated with the new HHVM build using ICU 57 will use the new sorting while untouched pages use the old sorting. As such, this change needs to be coordinated with Community Liaisons.

These sorting problems will only be fixed once the updateCollation maintenance script [1] has completed to run for an affected wiki. For past collation changes that took e.g. four hours in 2016 for Swedish Wikipedia and six days for English Wikipedia [2].

Current open questions:

  • Discuss the possible concurrency of updateCollation runs per shard with DBAs, in particular for wikis in s2 to minimise user visible impact
  • Check the head start Community Liaisons needs to prepare user notifications

The plan for the actual migration would look like this:

  • Merge a patch to enable component/icu57 for our jessie-based mediawiki app servers
  • Migrate the canaries to the new HHVM build and keep an eye on logs/metrics for an hour (new packages are running in beta already for about a month, but nothing beats production traffic)
  • If all is well, upgrade HHVM on the remaining app servers in eqiad and terbium
  • Initiate updateCollation runs on terbium
  • Upgrade app servers in codfw

Once we've migrated to ICU 57 this unblocks our migration of the app servers to Debian stretch (ICU 57 is backported from stretch) (T174431) and allows us to use a more recent version of Unicode in Mediawiki (T188480)

Footnotes:
[1] https://www.mediawiki.org/wiki/Manual:UpdateCollation.php
[2] https://phabricator.wikimedia.org/T146675#2668367

Related Objects

MoritzMuehlenhoff triaged this task as High priority.
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptMar 9 2018, 12:41 PM
Restricted Application added subscribers: jeblad, Base. · View Herald Transcript
Reedy updated the task description. (Show Details)Mar 9 2018, 5:52 PM
Joe added a comment.Mar 16 2018, 11:05 PM

See https://phabricator.wikimedia.org/T86096#2329554 and https://phabricator.wikimedia.org/T86096#2326032 as methods to evaluate run times. We should also test the speed of the script beforehand on a selected wiki on each section by running it with --force.

We might find we need to use php, not hhvm, to run it. Long-running scripts are a notorius performance hole of hhvm, so better check beforehand:

That might be an issue as php5 has not been recompiled AIUI

See https://phabricator.wikimedia.org/T86096#2329554 and https://phabricator.wikimedia.org/T86096#2326032 as methods to evaluate run times. We should also test the speed of the script beforehand on a selected wiki on each section by running it with --force.

But the numbers in dry-run mode would only be a rough estimate since all the DB writes are skipped.

We might find we need to use php, not hhvm, to run it. Long-running scripts are a notorius performance hole of hhvm, so better check beforehand:
That might be an issue as php5 has not been recompiled AIUI

That's not really possible, as the dependency chain of libraries using ICU used by PHP is too deep, we'll have to run the maintenance script with HHVM.

Bawolff added a comment.EditedMar 20 2018, 12:49 PM

There's a gerrit patch somewhere (T146341) that would allow the script to be stopped/restarted with no ill effect, but it stalled due to disagreement among reviewers

Another thing to watch out for, is that farsi wikis are using a hack to work around a bug in the old version of libicu. They should probably be moved back to standard uca-fa collation after the upgrade (assuming bug is fixed). See T139110 for details.

Another thing to watch out for, is that farsi wikis are using a hack to work around a bug in the old version of libicu. They should probably be moved back to standard uca-fa collation after the upgrade (assuming bug is fixed). See T139110 for details.

I just came here to say that which I saw @Bawolff beat me to it. Yes, please change uca setting of all Persian Wikis back to default when this happens.

elukey moved this task from Backlog to Keep an eye on it on the User-Elukey board.Mar 23 2018, 3:54 PM

Another thing to watch out for, is that farsi wikis are using a hack to work around a bug in the old version of libicu. They should probably be moved back to standard uca-fa collation after the upgrade (assuming bug is fixed). See T139110 for details.

I just came here to say that which I saw @Bawolff beat me to it. Yes, please change uca setting of all Persian Wikis back to default when this happens.

Let's not entangle this with the already big migration task, can someone from the the Persion Wiki community please

  • run a test with ICU 57 [1] to assess whether the bug which required a back in ICU 52 is fixed upstream?
  • If fixed, prepare patches to switch uca-fa collation and open a separate task to deploy the change with the respective updateCollation run
  • If it's not fixed, I guess the hack on the mediawiki side will continue to work and no additional action is needed?

The packages can be added to mediawiki-vagrant or a test wiki by adding the following apt source and running "apt-get dist-upgrade":
deb http://apt.wikimedia.org/wikimedia jessie-wikimedia component/icu57

Beta/deployment-prep is also upgraded to the version if that helps.

It's safe to say I'm a Persian Wikipedia community member (with 56K edits). I will test it and let you know ASAP

Joe added a comment.EditedApr 9 2018, 6:45 AM

Total number of rows to sort through per shard:

s1 - 114M rows
s2 - 73M rows
s3 - 41M rows
s6 - 54M rows
s7 = 50M rows

Mentioned in SAL (#wikimedia-operations) [2018-04-09T07:17:46Z] <moritzm> upgrading mw1261 to ICU 57-enabled HHVM build (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T07:24:29Z] <moritzm> repooling mw1261 after upgrade to ICU 57-enabled HHVM build (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T07:32:32Z] <moritzm> upgrading mw1262-1265 to ICU 57-enabled HHVM build (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T07:38:10Z] <_joe_> upgrading mw1300 to ICU 57-enabled HHVM build (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T07:42:30Z] <_joe_> repooling mw1300 now with ICU 57-enabled HHVM build (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T07:48:04Z] <moritzm> upgrading mw1276-1279 (API canaries) to ICU 57-enabled HHVM build (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T08:32:51Z] <_joe_> upgrading eqiad jobrunners to ICU 57-enabled HHVM build (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T08:32:57Z] <moritzm> upgrading remaining app servers in eqiad to to ICU 57-enabled HHVM build (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T08:45:05Z] <elukey> upgrading eqiad api appservers to ICU 57-enabled HHVM build (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T09:33:09Z] <_joe_> all eqiad jobrunners migrated to ICU 57 (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T09:54:36Z] <moritzm> upgrading mwdebug servers in eqiad to to ICU 57-enabled HHVM build (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T10:15:07Z] <moritzm> upgrading tin/deploy1001 to a ICU 57-enabled HHVM build (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T10:31:00Z] <moritzm> upgrading Boost libraries on app server canaries with a ICU 57-enabled HHVM build and restart HHVM (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T10:41:39Z] <moritzm> upgrading Boost libraries on mw1300 with a ICU 57-enabled HHVM build and restart HHVM (T189295)

Change 425027 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::mediawiki::hhvm: default php to php7 on stretch

https://gerrit.wikimedia.org/r/425027

Mentioned in SAL (#wikimedia-operations) [2018-04-09T11:04:47Z] <moritzm> upgrading Boost libraries on API server canaries with a ICU 57-enabled HHVM build and restart HHVM (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T11:50:45Z] <moritzm> upgrading Boost libraries on remaining app servers with a ICU 57-enabled HHVM build and restart HHVM (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T12:01:30Z] <elukey> upgrading Boost libraries on all mediawiki eqiad API server with a ICU 57-enabled HHVM build and restart HHVM (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T12:05:02Z] <_joe_> upgrading boost, hhvm on terbium for ICU 57 upgrade (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T12:23:31Z] <_joe_> preparing to run updateCollation from mw1338: stop videoscaler, disable puppet (T189295)

Joe added a comment.Apr 9 2018, 12:26 PM

Since I've noticed a 45% speed increase when running the updateCollation.php script with php 7.0 versus HHVM, I'm temporarily setting up mw1338 to run the scripts; I will stop the videoscaler and puppet there for the time being.

Note that php 7.0 cannot read memcached values set by HHVM (but vice-versa is possible...), but that doesn't seem to slow down php enough to make HHVM comparably fast.

Mentioned in SAL (#wikimedia-operations) [2018-04-09T12:39:36Z] <moritzm> upgrading Boost libraries on job runners with a ICU 57-enabled HHVM build and restart HHVM (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T12:54:53Z] <moritzm> upgrading Boost libraries on mwdebug with a ICU 57-enabled HHVM build and restart HHVM (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T13:03:10Z] <_joe_> upgrading HHVM / libboost for ICU 57 upgrade (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T13:14:09Z] <_joe_> started updateCollation.php maintenance script for the ICU 57 migration (T189295)

Mentioned in SAL (#wikimedia-operations) [2018-04-09T13:14:32Z] <moritzm> upgrading Boost libraries on mwdebug with a ICU 57-enabled HHVM build and restart HHVM (T189295)

Joe updated the task description. (Show Details)Apr 9 2018, 1:31 PM
Joe updated the task description. (Show Details)Apr 9 2018, 1:47 PM
Joe updated the task description. (Show Details)Apr 9 2018, 3:00 PM
Krinkle updated the task description. (Show Details)Apr 9 2018, 3:49 PM
Joe updated the task description. (Show Details)Apr 9 2018, 3:50 PM
Krinkle added a subscriber: Krinkle.EditedApr 9 2018, 3:57 PM

Change 425027 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::mediawiki::hhvm: default php to php7 on stretch

https://gerrit.wikimedia.org/r/425027

This is a very significant deviance from previous consensus (e.g. at T176370 and T174431). Can you clarify exactly how and where in production we now have MediaWiki code running under PHP 7, and why we can't migrate those to HHVM first?

This seems rather concerning given the unresolved tasks at PHP 7.0 support, including that neither neither Beta nor CI runs PHP 7 yet.

Joe added a comment.EditedApr 10 2018, 4:20 AM

Change 425027 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::mediawiki::hhvm: default php to php7 on stretch

https://gerrit.wikimedia.org/r/425027

This is a very significant deviance from previous consensus (e.g. at T176370 and T174431). Can you clarify exactly how and where in production we now have MediaWiki code running under PHP 7, and why we can't migrate those to HHVM first?

This seems rather concerning given the unresolved tasks at PHP 7.0 support, including that neither neither Beta nor CI runs PHP 7 yet.

Since it doesn't seem to me I merged that change (which was created in a rush when we discovered mwscript was rolled back to php5, which is again against the previously reached consensus), I don't understand why we should discuss that here and not on the patch itself. That patch, by the way, was trying to reproduce the behaviour that was introduced in mwscript but is too broad; also, it's blocked on us solving the HHVM/PHP 7 memcached incompatibiltiies.

After a day of debates, I think the best solution for now is to check if scap needs to use Zend php and not hhvm, and if that's the case, we need to specialize that rather than migrating everything.

Joe added a comment.Apr 10 2018, 4:22 AM

Also a note on beta not running php7: when we migrated to HHVM it was made very clear to me and to Ori that we could not use beta for testing the migration; so I'm assuming we'll have to do the same this time as well and do the tests in production.

Joe updated the task description. (Show Details)Apr 10 2018, 4:26 AM
Joe updated the task description. (Show Details)Apr 10 2018, 5:02 AM
Joe updated the task description. (Show Details)Apr 10 2018, 8:59 AM

codfw has now also been upgraded to the ICU-enabled HHVM build (and the related Boost libraries)

Joe updated the task description. (Show Details)Apr 10 2018, 12:06 PM
Joe updated the task description. (Show Details)Apr 10 2018, 2:19 PM
Joe updated the task description. (Show Details)Apr 11 2018, 6:11 AM
Joe closed this task as Resolved.Apr 16 2018, 6:39 AM
Joe updated the task description. (Show Details)

Enwiki finished it run at 14:40 UTC on saturday april 14th.