Page MenuHomePhabricator

Strange behaviour for a specific unicode character
Closed, ResolvedPublic

Description

Author: dg

Description:
If you go to Hymn_of_the_Russian_Federation page, and look for "Other languages"
-> Russian (it looks like Pycckuu), if you hover over the link (or click it, for
that matter), you'll see two "broken" characters for each russian one,
presumably because Unicode is not translated properly. Notice now that if you
edit the page, go to [[ru]] link and get rid of or change the #1056 character
(first after the space) it now renders properly and when you click on it goes to
the right place in the Russian wiki (except, of course it's wrong, since you've
changed one character). The character in question is the russian uppercase R,
which looks like english P. It is used in the text of the article itself many
times and is rendered properly.

I am using Mozilla 1.6 on Linux, but I also get the same effect in Safari 1.2.3
on MacOS X.


Version: unspecified
Severity: normal
OS: Linux
URL: http://en.wikipedia.org/w/wiki.phtml?title=Hymn_of_the_Russian_Federation&oldid=5300036

Details

Reference
bz168

Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 6:46 PM
bzimport set Reference to bz168.
bzimport added a subscriber: Unknown Object (MLST).

I don't see anything out of the ordinary in Safari 1.2.3. Could you provide some screen shots of the bug in action, and reference the exact revisions of
the page that do and don't work?

timwi wrote:

I can confirm that I see this bug exactly as described. Something weird is
happening there.

The original inter-wiki link was:
[[ru:Гимн России]]

It produced this link:
http://ru.wikipedia.org/wiki/%D0%93%D0%B8%D0%BC%D0%BD_%D0_%D0%BE%D1%81%D1%81%D0%B8%D0%B8

Notice that, in the middle, there is "_%D0_". This should instead be "_%D0%A0",
because the Cyrillic capital letter Er is %D0%A0 in UTF-8.

This means the bug is caused by the "%A0", which is a nonbreaking space in
Latin-1 (but not in UTF-8), being replaced by a simple space (and hence, an
underscore). Browsers (in my case, Firefox) then no longer recognise the link
as being in UTF-8, interpret it as Latin-1, and so it comes out jumbled.
Similarly, when you actually follow the link, the wiki software will notice that
the link is not proper UTF-8, pretend it was Latin-1, convert it to UTF-8, and
forward the user to a page that obviously doesn't exist.

timwi wrote:

Brion, you might find my above comment interesting, so I am adding you to the CC
list. Please let me know if you do not want me to do this.

timwi wrote:

Replacing the inter-wiki link with
[[ru:%D0%93%D0%B8%D0%BC%D0%BD_%D0%A0%D0%BE%D1%81%D1%81%D0%B8%D0%B8]]
didn't help; the same bug occurs.

jeluf wrote:

There was a patch lately to convert NBSPs to _. Some vandal created accounts
with a nick with trailing NBSPs and articles with NBSPs in their name which were
hard to block/delete. This probably has broken these links.

Tim's hacked it to avoid the non-breaking space check for interwiki links so it may be working now; please check. I'm not sure this is fully in place as
it's changed only in REL1_3, so I'm not marking as FIXED yet.

Timwi, don't bother CC'ing me, as I get and read *all* bugmail via wikibugs-l. :)

Has this been fixed in CVS or is it still there?

wmahan_04 wrote:

(In reply to comment #7)

Has this been fixed in CVS or is it still there?

It appears to be fixed. The example given in comment #2 gives the correct
output, as I understand it; see http://test.wikipedia.org/wiki/Bug168