Right now, whenever we want to upgrade the libicu version we link to in production, this implies a long process, that goes as follows:
- Warn all communities that they will see some form of sorting weirdness in categories for some time
- Change the version of ICU we link to everywhere (this is very tricky without changing linux distribution versions, as we need to rebuild quite a lot of packages)
- Run updateCollation.php on all wikis with a collation defined. Last time (see T189295) it took a week to run on enwiki.
There are two problems with the current approach:
- It's a lot of toil for the SRE teams, which need to do specialized rebuilding of a lot of packages, and to run and monitor scripts on 100s of wikis
- More importantly, it's a disservice to users who will see badly-sorted categories for a week
We need a smarter way to do this.
A couple ideas I had:
- One simple approach that would reduce the disservice to users would be to just add additional colums to the categorylinks table, and precompute the new values before we perform the switch, and just switch which colums we read when we start using
- To improve on the preceding idea, we could spawn a job, whenever we have to recalculate the collation for categorylinks, that will asynchronously fill in the values in the additional columns by running a small executable we can prepare.