Page MenuHomePhabricator

Spaces in section names in internal links are encoded incorrectly by Parsoid
Closed, DuplicatePublic

Description

https://www.mediawiki.org/w/index.php?title=Topic:T6nggde0okxad991&topic_showPostId=t6nsueomoaodcswh#flow-post-t6nsueomoaodcswh

Flow generates

href="//wikimediafoundation.org/wiki/Terms%20of%20Use#7.%20Licensing%20of%20Content"

but it should be

href="//wikimediafoundation.org/wiki/Terms_of_Use#7._Licensing_of_Content"

(the first link does not work as expected).

Event Timeline

He7d3r created this task.Jun 27 2016, 11:20 AM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 27 2016, 11:20 AM
Catrope added a subscriber: Catrope.

This is a bug in Parsoid:

$ echo "[[wmf:Terms of Use#7. Licensing of Content]]" | node bin/parse.js --normalize

<p><a href="//wikimediafoundation.org/wiki/Terms of Use#7. Licensing of Content" title="wmf:Terms of Use">wmf:Terms of Use#7. Licensing of Content</a></p>
Catrope renamed this task from Spaces in internal links are encoded incorrectly on Flow pages to Spaces in section names in internal links are encoded incorrectly by Parsoid.Jun 28 2016, 3:10 PM

There are two things going on here. One is that spaces are used instead of underscores in the URL path (/wiki/Terms of Use instead of /wiki/Terms_of_Use); this is incorrect but MediaWiki will redirect to the correct URL. The other is that spaces are used instead of underscores in the fragment (#7. Licensing of Content instead of #7._Licensing_of_Content) which is incorrect and fails to point to the correct section.

There are two things going on here. One is that spaces are used instead of underscores in the URL path (/wiki/Terms of Use instead of /wiki/Terms_of_Use); this is incorrect but MediaWiki will redirect to the correct URL. The other is that spaces are used instead of underscores in the fragment (#7. Licensing of Content instead of #7._Licensing_of_Content) which is incorrect and fails to point to the correct section.

This also breaks with Unicode characters:

$ echo "[[Foo#עברית]]" | node bin/parse.js --normalize

<p><a href="Foo#עברית" title="Foo">Foo#עברית</a></p>

(should be Foo#.D7.A2.D7.91.D7.A8.D7.99.D7.AA)

ssastry triaged this task as Normal priority.Jul 15 2016, 9:48 PM