Page MenuHomePhabricator

some "invalid" internal links (in UTF-8 type Wikipedias) looks different then other "invalid" internal links
Closed, ResolvedPublic

Description

Author: gangleri

Description:
If you compare this page ([[en:User:Gangleri/tests/Unicode_ISO_8859-
1/links]]) with [[meta:User:Gangleri/tests/Unicode_ISO_8859-1]] (
http://meta.wikimedia.org/wiki/User:Gangleri/tests/Unicode_ISO_8859-
1 ) you will see a different behaviour.

Regards Reinhardt


Version: 1.3.x
Severity: minor
OS: Windows XP
Platform: PC
URL: http://en.wikipedia.org/wiki/User:Gangleri/tests/Unicode_ISO_8859-1/links

Details

Reference
bz909

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 7:02 PM
bzimport set Reference to bz909.
bzimport added a subscriber: Unknown Object (MLST).

It's expected that they would be different, since they are different
character sets with different requirements.

If you think something is not correct, please describe the exact
symptoms, and the *minimal, isolated* test cases for showing them.
Linking to a couple pages with dozens of links and just saying "they're
different" is not helpful.

gangleri wrote:

At http://en.wikipedia.org/wiki/User:Gangleri/tests/Unicode_ISO_8859-1/links additional
tests are available:

Summary: Only some character are responsible, that no links are generated. Such characters
are: [[Hebrew]] "נ" (נ), [[Runes|rune]] "ᚠ" (ᚠ) and probably
some more depending on other charactersets.

At the mentioned page these two links (with one single Unicode character) behave different
compared to other links with Unicode characters:
[[en:User:Gangleri/tests/Unicode_ISO_8859-1/נ]] with the "נ" character
*[[en:User:Gangleri/tests/Unicode_ISO_8859-1/Runes/ᚠ]] with the "ᚠ" character

Regards Reinhardt

They're invalid because they contain the byte %A0, which is not allowed in titles on a latin-1 wiki.

That the character references are converted as UTF-8 bytes in the first place is due to bug 65. Marking as duplicate.

  • This bug has been marked as a duplicate of 65 ***