Page MenuHomePhabricator

Non US-ASCII chars in category names: Any page bound to a category with "ø" in its name is not listed
Open, Needs TriagePublic

Description

According to https://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_(technical_restrictions) for page names, "ø" is a valid char in pagenames, and I'm asumming this is also valid for category names since nowhere in the documentation it's stated otherwise, however this char breaks category names in mediawiki 1.26.2: any page bound to a category with "ø" as part of its name will never be listed as part of it. Try for yourself, or see here for a real life example:

http://va-de-retro.com/vadewiki/Category:Br%C3%B8der

As you can see, the category appears to be empty, however if you enter "test1" or "test2" in the search box, you will land in two pages that ARE part of the category.

There may be more problematic characters that aren't mentioned anywhere but I found a problem with this one in particular as I'm using a related category already:

http://va-de-retro.com/vadewiki/Category:Juegos_de_Br%C3%B8derbund

which at some point worked at it should, yet right now I can't make new pages appear on it.

Event Timeline

Which database and database version and collation is used on that website?

Aklapper renamed this task from Non US-ASCII chars break category names to Non US-ASCII chars in category names: Any page bound to a category with "ø" in its name is not listed.Apr 25 2016, 6:14 AM

This is MariaDB 10.0.20. As for collation, I'm not sure - this is a shared server and MW was installed by someone else long ago. How do I find out?

Thanks.

$wgDBTableOptions = "ENGINE=InnoDB, DEFAULT CHARSET=binary";

is enabled in LocalSettings.php if that helps.

OK, I'm told collation is utf8_general_ci

Database is mysql 5.1.61. Anything else you need?