Page MenuHomePhabricator

Parsoid: Percent-encode % in URLs
Closed, ResolvedPublic

Description

A this page:
http://parsoid.wmflabs.org/ko/%ED%95%9C%EC%96%91%EB%8C%80%ED%95%99%EA%B5%90_%EC%B4%9D%ED%95%99%EC%83%9D%ED%9A%8C

You have a link with this href attribute:
href="./한양대학교_총학생회#소리없는_99%의_명예혁명"

Like you can see this is not URL encoded, the '%' sign is a reserved character and *must* be encoded IMO.


Version: unspecified
Severity: normal

Details

Reference
bz53146

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:05 AM
bzimport set Reference to bz53146.
Kelson created this task.Aug 21 2013, 8:56 AM

http://tools.ietf.org/html/rfc3986#section-2.4 agrees with you. I believe we currently only percent-encode % to %25 when followed by hex chars.

We have fixed a really old, but similar, bug in Kiwix, three week ago in HK... but whereas C++ doesn't have escape/unescape buildin functions, javascript does: (encodeURIComponent()/decodeURIComponent())... So I was a little bit surprise to catch such one!

We only use those selectively, as the JS version also encodes chars that don't need to be encoded when using UTF8:

encodeURIComponent('ü')
'%C3%BC'

Change 80318 had a related patch set uploaded by GWicke:
Bug 53146: Percent-encode fragment identifiers too

https://gerrit.wikimedia.org/r/80318

Change 80318 merged by jenkins-bot:
Bug 53146: Percent-encode fragment identifiers too

https://gerrit.wikimedia.org/r/80318

The fix is now deployed.