Page MenuHomePhabricator

$wgCategoryCollation setting makes categories contain characters from wrong alphabets
Closed, ResolvedPublic

Description

Hello. All Wikimedia projects which use wgCategoryCollation setting for collations has been failed. These sites show unrelated header characters on categories.

Examples:

On fa.wiki:
http://fa.wikipedia.org/wiki/رده:استان‌های_ایران

On ckb.wiki
http://ckb.wikipedia.org/wiki/پۆل:ئەلفوبێی_کوردی

On pt.wiki:
http://pt.wikipedia.org/wiki/Categoria:História


Version: unspecified
Severity: major

Details

Reference
bz55565

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 2:24 AM
bzimport set Reference to bz55565.
bzimport added a subscriber: Unknown Object (MLST).

17.55 < Reedy> The only thing that have changed are some DB hosts, and a PHP package upgrade
17.55 < Reedy> (according to SAL)
17.55 < apergos> in the last 3 hours?
17.55 < Reedy> He said over 3 hours

Per bug 46036 comment 1, it looks like someone just needs to run some maintenance scripts on the cluster after package upgrades.

(In reply to comment #3)

Per bug 46036 comment 1, it looks like someone just needs to run some
maintenance scripts on the cluster after package upgrades.

Specificly updateCollation.php --force after php upgrade unless you make sure to compile against same version of icu library.

Reedy tried this on plwikivoyage, but it doesn't appear to have helped.

(This is being actively worked on by ops. Turned out to be not that easy, my understanding is that something got messed up during the package upgrading.)

(In reply to comment #5)

Reedy tried this on plwikivoyage, but it doesn't appear to have helped.

Did he use the --force option?

Also there is a memcached key that should be deleted dbname:first-letters:collation-name or something. (Alternatively bump IcuCollation::FIRST_LETTER_VERSION.)

(We should really be including icu version in that cache key)

pasting this here from http://paste.debian.net/plain/55306 because that may time out after 24 hours:

this was from /var/log/apt/history.log on mw1020

Start-Date: 2013-10-10 12:18:23
Commandline: /usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install php5-mysql
Install: libicu42:amd64 (4.2.1-3ubuntu0.10.04.1, automatic)

Upgrade: libapache2-mod-php5:amd64 (5.3.10-1ubuntu3.6+wmf1, 5.3.10-1ubuntu3.8+wmf1), php5-curl:amd64 (5.3.10-1ubuntu3.6+wmf1, 5.3.10-1ubuntu3.8+wmf1), php5-xmlrpc:amd64 (5.3.10-1ubuntu3.6+wmf1, 5.3.10-1ubuntu3.8+wmf1), php5-intl:amd64 (5.3.10-1ubuntu3.6+wmf1, 5.3.10-1ubuntu3.8+wmf1), php5-mysql:amd64 (5.3.10-1ubuntu3.6+wmf1, 5.3.10-1ubuntu3.8+wmf1), php5-cli:amd64 (5.3.10-1ubuntu3.6+wmf1, 5.3.10-1ubuntu3.8+wmf1), php5-common:amd64 (5.3.10-1ubuntu3.6+wmf1, 5.3.10-1ubuntu3.8+wmf1)
End-Date: 2013-10-10 12:18:35

This was the result of a new version of PHP being rolled out across the fleet that was badly built (built under an unclean environment, having the wrong version of libicu installed). We now have 5.3.10-1ubuntu3.8+wmf2, this should be fixed.

nzmoihue wrote:

Still not perfect, https://fa.wikipedia.org/wiki/%D8%B1%D8%AF%D9%87:%D8%B5%D9%88%D8%B1%D8%AA_%D9%81%D9%84%DA%A9%DB%8C_%D8%A8%D8%B1%D9%87 seems all digits type on first of page title is being converted to Arabic digits. We shouldn't see '1' '2' '3' (Arabic Digits) and we should see '۱' '۲' '۳' (Persian Digits) instead. Reproducible on all categories and also on ckbwiki https://ckb.wikipedia.org/w/index.php?title=%D9%BE%DB%86%D9%84:%DA%95%DB%86%DA%98%DB%95%DA%A9%D8%A7%D9%86%DB%8C_%D8%B3%D8%A7%DA%B5&action=edit&redlink=1 that are using Arabic-Indic digits.

Are you sure it wasn't like this before?

Can you file a separate bug? It's almost certainly not related (unless the package upgrade changed ICU's behavior for this case).

Yes, we are sure. Please reopen this bug.