Page MenuHomePhabricator

Provide a forward port of ICU 52 for stretch / Investigate best ICU update strategy
Closed, ResolvedPublic

Description

When the app servers were migrated from precise to trusty, the ICU version from precise (48) was built for trusty-wikimedia and HHVM built against it. (trusty and jessie both used ICU 52, so that wasn't necessary for that migration). stretch ships ICU 57 and to untangle the update of the app server base OS from the collation update, I created a co-installable icu52 source package to investigate further. However, it's a lot more complicated in stretch compared to what was done in precise:

The problem is that there's two additional libraries which are used by HHVM which themselves use ICU, i.e. they would also need to be rebuilt against ICU 52: libxml2-dev and libboost-regex-dev. The problem is these two libraries (in particular libxml2) are also used outside of HHVM. If we proceed this upgrade path, I think this would involve:

  • create a separate repo section like hhvm
  • rebuild src:icu52 in the icu52 hhvm section
  • rebuild libxml2 and boost-regex against libicu52-dev and add to hhvm section
  • rebuild HHVM in hhvm section against libicu52

This means that on all app servers (and the related roles like deployment servers), we'd use different builds of libxml and boost-regex than on the rest of the production cluster. That's still preferable to using these modified libs fleet-wide (especially since those deviations from the standard Debian package set are temporary).

But there's also a second potential upgrade path:
Build a backport of libicu57 for jessie and perform the collation migration while the app servers are running jessie. In jessie libxml doesn't link against ICU yet (that change was made in 2.9.2+dfsg1-3, so in the jessie->stretch development time frame), which has less inter-dependencies. When the migration to ICU 57 is complete, we could upgrade the app servers to an ICU build which runs the pristine ICU as shipped in stretch.

I'll prepare a test backport of ICU 57 from stretch for jessie and will test an HHVM 3.18 build on jessie against that build (boost-regex might still be an issue, needs to be tested).

I can't really estimate the collation changes between 52 and 57, according to http://site.icu-project.org/download that includes a bump from Unicode 6.3 to 8.0 among other changes.

Event Timeline

I investigated the upgrade procedure for "provide icu57 in jessie and migrate before moving to stretch": This allows for a much less invasive transition (mostly because libxml2 in jessie doesn't link against ICU yet):

  • I created a backport of icu from stretch as a separate source package icu57. It can reside on apt.wikimedia.org in parallel to icu:
  • libicu-dev has been renamed to libicu57-dev (it conflicts with libicu-dev since the .a lib has the same name)
  • icu-devtools has been renamed to icu57-devtools
  • icu-doc was dropped
  • boost1.55 from jessie was rebuilt against libicu57-dev (since the regex module used by HHVM uses ICU)
  • HHVM was rebuilt against libicu57-dev

If we build these three packages in a separate archive component jessie-wikimedia/icu57, on a typical application server setup there would be no other package except hhvm which depends on the modified ICU/Boost. (The boost1.55 source package also builds libboost-iostream which is used by aptitude, but it's unaffected by the ICU linking in boost-regex and aptitude isn't critical anyway).

In contrast the "build icu52 for stretch and migrate later on" migration path would be far more complex, since reverse dependencies of libxml2 would need to be rebuilt as well. This includes packages like apache, nginx and even php5-cli.

I investigated the upgrade procedure for "provide icu57 in jessie and migrate before moving to stretch": This allows for a much less invasive transition (mostly because libxml2 in jessie doesn't link against ICU yet):
(snip)

That makes a lot of sense to me. Thanks for all the background work to support this :)

(The boost1.55 source package also builds libboost-iostream which is used by aptitude, but it's unaffected by the ICU linking in boost-regex and aptitude isn't critical anyway).

I don't think anyone is using aptitude and I'd consider aptitude deprecated in favor of "apt" these days anyway. Let's just purge aptitude from across the fleet and be done with it?

MoritzMuehlenhoff triaged this task as High priority.

[I was asked to post this here]

We need to do the libicu transition because sort order and in particular the binary sort keys are not stable between versions, and in the past it has been very incompatible between versiond.

e

Beta/deployment-prep has been upgraded to an HHVM build using ICU 57.

Change 410229 had a related patch set uploaded (by Anomie; owner: Anomie):
[utfnormal@master] Update to Unicode 8.0.0

https://gerrit.wikimedia.org/r/410229

Change 410230 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Scribunto@master] Update ustring data tables

https://gerrit.wikimedia.org/r/410230

Change 410229 merged by jenkins-bot:
[utfnormal@master] Update to Unicode 8.0.0

https://gerrit.wikimedia.org/r/410229

The packages have been built and tests have been made, closing. The task for the actual migration is T189295

Change 410230 merged by jenkins-bot:
[mediawiki/extensions/Scribunto@master] Update ustring data tables

https://gerrit.wikimedia.org/r/410230