Broken Unicode links to Wikipedia at Commons
Closed, ResolvedPublic

Assigned To
None
Priority
High
Author
bzimport
Blocks
T5969: Unicode (UTF-8, utf8) compatibility (tracking)
Commits
Unknown Object (Commit)
Subscribers
wikibugs-l
Projects
Reference
bz563
Description

Author: ausir

Description:
There's a problem at Wikimedia Commons with linking to Wikipedia articles with
Unicode characters. For example, a link to [[w:pl:Stanis%C5%82aw Lem]] links to
[[w:pl:Stanis]] instead. External links like
[pl.wikipedia.org/wiki/Stanis%C5%82aw_Lem Stanis%C5%82aw Lem]] can of course be
used, but it'd be better if they were no-arrow interwiki links.


Version: unspecified
Severity: normal
URL: http://commons.wikimedia.org/wiki/Commons:Language_policy

bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz563.
bzimport created this task.Via LegacySep 23 2004, 1:05 AM
brion added a comment.Via ConduitOct 5 2004, 1:03 AM
  • Bug 645 has been marked as a duplicate of this bug. ***
brion added a comment.Via ConduitOct 5 2004, 1:04 AM

From duplicate bug 645:

Dear friends,

at http://meta.wikimedia.org/wiki/User:Gangleri/remarks the
first remark named "link to [[w:ro:Discuţie
Utilizator:Gangleri]] ~ [[w:ro:User_talk:Gangleri]]" ilustrates
that some special characters as "ţ" cause problems in InterWiki
links.

If this issue is already known please let me know where to read
about such "exceptions".

It maight be somehow related to
http://bugzilla.wikipedia.org/show_bug.cgi?id=579 too because it
shows the same effect.

Best regards Reinhardt

brion added a comment.Via ConduitOct 5 2004, 1:06 AM

Looks like a problem with the way the redirect is handled through en.wikipedia.org.

bzimport added a comment.Via ConduitOct 10 2004, 7:32 PM

jimes wrote:

http://en.wikipedia.org/wiki/October_2004 has a link "[[Juraj Beneš]]"
which displays correctly on that page as "Juraj Beneš" links incorrectly to
"Juraj Beneš" ("Juraj_Bene%C5%A1").

bzimport added a comment.Via ConduitNov 9 2004, 9:11 PM

gangleri wrote:

I made a central place [[Wikipedia:Invalid article names]] in
[[:Category:Wikipedia maintenance]] where such broken links can be listed.
This is an equivalent to [[Wikipedia:Duplicate articles]]. I know some
others. I suppose they have been created with previous versions of WikiMedia
software.
Regards Reinhardt

bzimport added a comment.Via ConduitNov 9 2004, 9:53 PM

gangleri wrote:

Notes: some weeks ago I made some remarks at
[[meta:User:Gangleri/remarks#Invalid_links_.28lists.29|meta:User:Gangleri/rema
rks#Invalid links (lists)]]. You may read some other related sections as well.

The problem is both the existence of pages which do not complay wit UTF-8 (?)
and the usage of such links in en.wikipedia and directed to en.wikipedia from
other projects.

It should be much easyer to verify the namespace then using a bot to scan for
such links inside the articles, talks, categories ... .
Regards ~~~~ (Reinhardt)

bzimport added a comment.Via ConduitNov 10 2004, 2:06 AM

gangleri wrote:

Dear friends,

you will find two sort of links:
a) [[:ro:Discuţie Utilizator:Gangleri]],
[[:ja:メインページ]],
[[:bg:Потребител
:Gangleri|bg]], [[:he:משתמש:Gangleri|he]],
[[:ja:利用者:Gangleri|ja]]

b) [[w:ro:Discuţie Utilizator:Gangleri]],
[[w:ja:メインページ]],
[[w:bg:Потребите&#1083
;:Gangleri|bg]], [[w:he:משתמש:Gangleri|he]],
[[w:ja:利用者:Gangleri|ja]]

Both work at en.wikipedia.org.

ONLY a) work at xx.wikipedia.org, meta.wikimedia.org ...

Regards Reinhardt

bzimport added a comment.Via ConduitNov 15 2004, 1:10 AM

gangleri wrote:

Dear friends,

a) Do we need a seccond test environment?
b) Should InterWiki translation should be be fixed in 1.4-cvs?

http://test.wikipedia.org/wiki/User:Gangleri/tests#Unicode_ISO_8859-1
shows that we are testing on an Unicode environment.

This means that issues related to InterWiki translations trough
en.wikipedia CAN NOT BE TESTED HERE.

My proposal:

Please make an testutf8.wikipedia.org environment and make the translations
from [[test:]] to [[xx:]] as [[🇫🇷]], [[:pl:]], [[:ro:]], [[🇷🇺]],
[[:he:]], [[:ja:]], [[:bg:]] ... trough testutf8.wikipedia.org because
problems regarding translations to those targets (except [[🇫🇷]]?) are
known.

There are a lot of bugs which can be included then:

  • bug 563,
  • translations related to the anchor part of a link, where the anchor

contains special characters as (, ), ", ', Unicode characters and so on.

  • ???

Regards Reinhardt

bzimport added a comment.Via ConduitNov 15 2004, 11:34 AM

gangleri wrote:

I was thinking on this again. It seems that three test environments are
needed in order to fix this / to emulate all combinations of final
translations:

B through which translations are made;
A and C in order to test:

  • A translated trough B to C
  • C translated trough B to A
  • A translated trough B to B
  • C translated trough B to C
  • B translated to A
  • B translated to C

If B will be a UTF-8 environment as it is now in 1.3.7. B should be an UTF-
8 too.

Regards Reinhardt

bzimport added a comment.Via ConduitJan 6 2005, 2:00 PM

kjell.andre wrote:

(In reply to comment #3)

Looks like a problem with the way the redirect is handled through en.wikipedia.org.

When testing a few variants of interwiki links you can see that:

: [[w:pl:Stanis%C5%82aw Lem]] and [[:en:pl:Stanis%C5%82aw Lem]] does not work.
: but [[:pl:Stanis%C5%82aw Lem]] and [[de:pl:Stanis%C5%82aw Lem]] does.

Obviously the first (or only) prefix determines how the name is interpreted. If it is a language prefix for a UTF8
based wiki it works since the name is valid for that wiki, if it is a a prefix for a Latin-1 based wiki, the name is
not valid for that wiki and is instead cut of at the invalid character and hence a broken link.

To correct this, the normalization of names must be based on the last (or only) language prefix, not the first.

Best regards, Kjell ANDRÉ

brion added a comment.Via ConduitFeb 27 2005, 9:59 AM

Now that bug 65 is fixed this was pretty easy to add on top. Checked in and put live.

epriestley added a commit: Unknown Object (Commit).Via DaemonsMar 4 2015, 8:20 AM

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.