Page MenuHomePhabricator

UTF-8 characters in linktrail does not work as expected on ISO 8859-1 wikis
Closed, ResolvedPublic

Description

Author: wegge

Description:
In LanguageDa, the linktrail is specified as:

"linktrail" => "/^((?:[a-z]|æ|ø|å)+)(.*)\$/sD",

The file is UTF-8 encoded, which gives some issues on the danish wikipedia, that
is still using ISO 8859-1. As can be seen on the example URL, the UTF character
are matched as two seperate characters.


Version: 1.4.x
Severity: normal
URL: http://da.wikipedia.org/wiki/Bruger:Wegge/Sandkassen

Details

Reference
bz2239

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:29 PM
bzimport set Reference to bz2239.
bzimport added a subscriber: Unknown Object (MLST).

There was a missing utf8_decode in the Latin-1 wrapper class, which failed to downconvert.

Fix applied to REL1_4 and put live on Wikimedia servers.

Not needed in HEAD as Latin1 support is being dropped.