urlencode on variables get double-encoded
OpenPublic

Description

Author: sergey.chernyshev

Description:
I use {{urlencode}} to encode {{PAGENAME}} value and it looks like it double encodes them.

I created a test page for it on Wikipedia and it has the same issue:
http://en.wikipedia.org/wiki/User:Sergey_Chernyshev/Variable_Urlencode_%27_bug


Version: unspecified
Severity: normal
URL: http://en.wikipedia.org/wiki/User:Sergey_Chernyshev/Variable_Urlencode_%27_bug

bzimport added a project: MediaWiki-Parser.Via ConduitNov 21 2014, 10:03 PM
bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz13288.
bzimport created this task.Via LegacyMar 7 2008, 10:56 PM
bzimport added a comment.Via ConduitMar 16 2008, 4:31 PM

nicdumz wrote:

Variables are being escaped through wfEscapeWikiText, so ' is converted to '
Then &, #, and ; from "'" are escaped by urlencode()

While wfEscapeWikiText sucks, a simple fix for now would be to html_entity_decode the text before any {{urlencode: processing : html entities in URLs are invalid anyway (" is a bad title, and &#nn; is interpreted by navigators as & )

Index: CoreParserFunctions.php

  • CoreParserFunctions.php (révision 32034)

+++ CoreParserFunctions.php (copie de travail)
@@ -82,7 +82,7 @@

	}

	static function urlencode( $parser, $s = '' ) {
  • return urlencode( $s );

+ return urlencode( html_entity_decode($s, ENT_QUOTES) );

	}

	static function lcfirst( $parser, $s = '' ) {
IAlex added a comment.Via ConduitFeb 13 2010, 3:31 PM
  • Bug 22508 has been marked as a duplicate of this bug. ***
Nikerabbit added a comment.Via ConduitFeb 13 2010, 4:00 PM

Isn't {{PAGENAMEE}} just for this purpose?

Platonides added a comment.Via ConduitFeb 13 2010, 4:03 PM

Yes, {{PAGENAMEE}} is a valid workaround. But it should work, nonetheless.

Nikerabbit added a comment.Via ConduitFeb 13 2010, 4:09 PM

I see no way how it could possibly work without breaking BC.

bzimport added a comment.Via ConduitFeb 13 2010, 4:28 PM

conrad.irwin wrote:

{{PAGENAMEE:{{PAGENAME:&}}}} -> %26 (RIGHT)
{{URLENCODE:{{PAGENAME:&}}}} -> %26amp%31 (WRONG)
{{PAGENAMEE:&}} -> %26 (WRONG - ?)
{{URLENCODE:&}} -> %26amp%31 (RIGHT)

I put the ? there because [[&]] creates a link to [[&]] (perhaps also wrong) and http://en.wikipedia.org/wiki/%26amp; is an server error.

I think the solution would be to have {{PAGENAME}} et.al. return a "text-needs-escape" object of some kind, parser functions could then request that they get unescaped input as a flag, the parser would then escape the text when the escaping is neeeded.

The Django template engine deals with this issue very nicely, maybe we can copy some of their ideas.

Nikerabbit added a comment.Via ConduitFeb 13 2010, 4:35 PM

(In reply to comment #6)

{{PAGENAMEE:&}} -> %26 (WRONG - ?)

I put the ? there because [[&]] creates a link to [[&]] (perhaps also
wrong) and http://en.wikipedia.org/wiki/%26amp; is an server error.

& is disabled on wmf due to broken clients. Also, entities in titles are normalised away unless I am mistaken.

I don't know enough about parser to say if that is possible.

bzimport added a comment.Via ConduitJun 20 2011, 4:41 PM

Amalthea.wikimedia wrote:

Core of the issue seems to be that {{PAGENAME}} and others internally escapes some characters to entities, which breaks other magic words/parser functions when they are using it directly.

{{#ifeq:{{PAGENAME:File:Aci Sant'Antonio.svg}}|Aci Sant'Antonio.svg|y|n}}
→ "n"

{{#ifeq:{{PAGENAME:File:Aci Sant'Antonio.svg}}|Aci Sant'Antonio.svg|y|n}}
→ "y"

{{FILEPATH:Aci_Sant'Antonio.svg}}
→ "http://upload.wikimedia.org/wikipedia/commons/0/00/Aci_Sant%27Antonio.svg"

{{FILEPATH:Aci Sant'Antonio.svg}}
→ ""

{{str left|{{PAGENAME:File:Aci Sant'Antonio.svg}}|12}}
→ "Aci Sant&#39"

bzimport added a comment.Via ConduitJun 20 2011, 4:41 PM

Amalthea.wikimedia wrote:

More or less duplicated by bug 16474 and bug 14779, as far as I can tell.

Add Comment