Page MenuHomePhabricator

Non-ASCII characters should be unescaped in fullurl, as in other URLs
Closed, ResolvedPublic

Description

Normally UTF-8 in URLs looks good with printable=yes, e.g.,
http://radioscanningtw.jidanni.org/index.php?title=%E5%8F%B0%E6%8E%83:General_disclaimer&printable=yes

But not when {{fullurl}} is involved, e.g.,
http://radioscanningtw.jidanni.org/index.php?title=%E5%8F%B0%E5%8D%97%E7%B8%A3%E6%B6%88%E9%98%B2%E5%B1%80&printable=yes

There they are printed as % escapes instead of UTF-8.

(Why I use {{fullurl}} is to discourage editing categories as I
discussed elsewhere.)


Version: 1.10.x
Severity: minor

Details

Reference
bz8876

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 9:33 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz8876.
bzimport added a subscriber: Unknown Object (MLST).

Please describe "with printable=yes" and "when {{fullurl}} is involved".

ayg wrote:

The issue appears to be that a link like

http://www.foo.example/免

has "免" printed in the source href attribute as a Chinese character, but a link like

{{fullurl:免}}

has 免 mangled to the escaped form, "%E5%85%8D". (The printable display aspect is just a symptom.)
I've confirmed this is true in trunk.

robchur wrote:

Er, there's a reason we escape these things.

The reason we *don't* do it in the printable form of pages is because it's
usually safe enough for the user to type the proper character as a URL directly.
It's also a damn sight prettier.

In all likelihood, the reason it doesn't happen with {{fullurl}} et al. is
because those operations are run before whatever code it is that un-escapes
certain URL components.

This is compatible URL/URI encoding of a UTF-8 IRI.

Some day when everyone's using fully IRI-compatible browsers,
we may make all URLs display in pretty UTF-8 (but keep in mind
that can make many URLs impossible to type).

z9z8z-wps wrote:

I understand this is just about the readability of the generated output, so typing URLs isn't at all what this is about.

What is sometimes hard to communicate to people whose language makes only use of 7-bit ASCII characters is, that people whose language
uses an extended set of characters are very well capable of entering 免, Wikipédia, or Füße, and that 免, Wikipédia, or Füße is way more
readable than %E5%85%8D, Wikip%C3%A9dia, or F%C3%BC%C3%9Fe. Since these sinister characters work fine in normal links there is no
technical limitation why fullurl would need to return these characters escaped. And for browser usage, IRI or not, there is fullurle with properly
escaped characters if I remember the docs correctly.

I just tested this, and it seems to me that this issue has been FIXED.