Page MenuHomePhabricator

CommRel support for ICU 63 upgrade
Closed, ResolvedPublic

Description

Ahoy CommRel -- my accomplices and I are planning to upgrade the ICU library on MediaWiki app servers from version 57 to version 63. This is a prerequisite to upgrading the OS and other software on those machines, and allows developers to use new internationalization features including support for Unicode 11.

There will be one user-visible effect: Category lists will be displayed out of order during the transition. This will last up to a few hours on smaller wikis, and longer for bigger ones, up to the neighborhood of a week for enwiki. (The reason: ICU functionality is involved in multi-lingual sorting keys, among other things, and those sorting keys will change with the new version. Categories sorted with the old ICU, but displayed with the new one, will be shown out of order until they're updated. After bumping the ICU version we'll run a maintenance script to update each page; last time, it took about a week to get all the way through enwiki.)

We don't yet know our exact timing for this. We expect to start the upgrade no earlier than Monday Nov 16, and finish fixing sorting keys no later than Friday Nov 27. We'll update you here as soon as we have something more precise.

For your reference: More technical details in the parent task T264991. CommRel task from last time is T189486.


  • Tech News 46
  • Tech News 47

Message for communities:

  • Upgrade and check it
  • Request translations
  • Distribute it (week 46)
  • T264991#6609917 lists impacted wikis

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
Resolved toan
ResolvedLucas_Werkmeister_WMDE
ResolvedJoe
ResolvedJdforrester-WMF
ResolvedLadsgroup
InvalidNone
ResolvedReedy
OpenNone
Resolvedtstarling
ResolvedJdforrester-WMF
StalledNone
ResolvedNone
ResolvedPRODUCTION ERRORLegoktm
Resolvedtstarling
ResolvedJoe
ResolvedKrinkle
Resolvedhashar
ResolvedJdforrester-WMF
ResolvedDzahn
Resolvedjijiki
ResolvedTrizek-WMF

Event Timeline

Trizek-WMF triaged this task as High priority.
Trizek-WMF added a project: User-notice.

I can handle it. :)

We expect to start the upgrade no earlier than Monday Nov 16, and finish fixing sorting keys no later than Friday Nov 27.

@RLazarus, the earlier you know the date, the better. I plan to start announcing it on Tech News. I don't know how familiar you are about Tech News, but to have something announced there a given Monday, the update has to be included the week before, on Thursday.

I think it would worth to have a first announcement being published tomorrow, as an early warning. The most important information is to know when the disturbance will start and when it would be finished.

When was the last time? A vaguely remember something about a change to make on categories sorting but I can't precisely find it back.

We just met this morning to sort out our timeline -- current plan is to do the do the upgrade on Nov 16. That means the disturbance will start then, and should finish on or about Nov 23. (I'll ask around, but I think there's no way to get a better estimate of how long the maintenance script will take to chew through enwiki, besides "well, probably about the same as last time." We might be able to get more precise once it's running, but that won't help you for Tech News.)

It's not impossible that start date will slip by a day or two, but I think it's very likely we'll hit it, and it should be safe to announce.

The last time was T189486 in 2018:

Announcements went out in Tech/News (13 and 15) and to the Village Pumps of the largest affected communities, using this message https://www.mediawiki.org/wiki/Wikimedia_Technical_Operations/ICU_announcement

I'm going to add something to Tech News, and we could refine it next week.

Thank you for finding the old task and the message! <3

I've updated the message with the version, the start date and the Phab ticket. Based on this task description, it seems to be the only changes required. However, could you double-check? Thank you. :)

Yep, that text looks good to me -- the "eight of the ten biggest Wikipedias" language is the last thing I wanted to verify, and it's still correct (jawiki and zhwiki are the exceptions). If you get any questions like "is my wiki affected?" you can find the full list at T264991#6609917. Any wiki not listed there should experience no disruption with the upgrade.

Thank you for the review and this list. I asked volunteers for translations and I will message the most impacted wikis on Friday.

I preventively posted a message on English Wikipedia, since they will be the most impacted.

I've been blocked by a last minute change made on translation, which required me to manually change date formats in translations. Also, creating a distribution list matching the list of targeted wiki to their village pumps was much more time consuming than I anticipated.

The message has now been distributed.

We've started upgrading the canary appservers to ICU 63, so the window of category sorting disruption has officially started.

All appservers are now running ICU 63, and the collation update script is running. Earlier today should have been the moment of peak incorrectly-sorted categories, and everything should improve from here. I'll update as the script finishes.

Trizek-WMF lowered the priority of this task from High to Medium.Nov 17 2020, 2:35 PM
Trizek-WMF updated the task description. (Show Details)

The s5 script (shwiki, srwiki) is finished, the rest are still chugging along.

s2, s6, and s7 have also finished. The s3 worker has completed wikis up through ruwikibooks (in alphabetical order).

The good news: s3 is also finished, so only enwiki is left, taking the longest as expected.

The bad news: The enwiki script needed to be restarted due to an unexpected hiccup, so our estimated "a week or so" completion time is also restarted from today. It'll likely still be running into the Thanksgiving holiday.

Out of interest, have you been getting any inquiries about this? I don't have a great sense of whether the out-of-order category listings are actually bothering anybody.

I regularly check on the wikis, but I've been offline between Wednesday evening and now (I'll check the wikis after posting this message). Anyway, as of the middle of last week, I haven't seen many discussions. All of them were not worth being reported, since other users pointed out to the announcement concerning this maintenance operation, which solved the topic. Overall, and fortulately, our users are patient and understanding. :)

I'll leave a message at English Wikipedia. @RLazarus, what was this hiccup you mention? People may like to know about the details.

@RLazarus, what was this hiccup you mention? People may like to know about the details.

The Data Persistence team needed to restart a database replica for routine work. Most of our production software can just retry any particular query on a different DB host, so that isn't usually disruptive. But updateCollation.php (the script that's migrating categories from the old collation to the new one) is only rarely used for this particular maneuver and hasn't had all that much engineering effort invested to make it resilient to that sort of event - it just connects to one replica and uses it for the entire run. When we got unlucky and that replica was restarted, instead of just reconnecting to a different host, the script exited (with no resume capability built in).

Obviously all that would be high on our list of things to improve for next time, but we're instead planning on reworking this whole process so that this user-visible migration period doesn't exist. We fully expect there won't be a "next time" at all.

This is finished on all wikis, a little bit slower than our original schedule but not stretching into the holiday weekend as I'd feared. :)

If there are any sorting issues that still persist, those are unexpected, and users should please open bug reports as they normally would. @Trizek-WMF I'm leaving this open in case there's any post-work you want to do, but it's finished from our perspective -- close it at your convenience. Thanks again for all the help!

Trizek-WMF closed this task as Resolved.EditedNov 24 2020, 5:04 PM

Thank you!

You mentioned that the next migration would be with a different system not creating any disturbance. Just in case, I detail here what I did:


We re-used the previous explanation message to avoid translators fatigue.

We added a mention of this change to Tech News two weeks ahead and for the week when it happen.

Wikis have been MassMessaged just before the operation. A pre-warning was sent to English Wikipedia, because it was the most impacted wiki.

Here is the list of wikis targeted this time, by server clusters, for MassMessage distribution.

When the operation started, I kept an eye on most important wikis to check for any complaints.


Other than that, I have no particular comment or other feedback on my side, so I close the task.

Always a pleasure to work with you!