Page MenuHomePhabricator

Interwiki links whose titles contain question marks are not properly escaped in HTML output.
Open, NormalPublic

Description

$ echo '[[:en:Shall We Dance? (2004 film)|談情共舞]]' | tests/parse.js --wt2wt --prefix zhwiki
[:en:Shall We Dance? (2004 film) 談情共舞]
$ echo '[[:en:Shall We Dance? (2004 film)|談情共舞]]' | tests/parse.js --wt2wt --prefix dewiki
[:en:Shall We Dance? (2004 film) 談情共舞]

but:

$ echo '[[:en:Shall We Dance? (2004 film)|談情共舞]]' | tests/parse.js --wt2wt --prefix enwiki
[[:en:Shall We Dance? (2004 film)|談情共舞]]

Seems to be a bug with WTS...

$ echo '[[:en:Shall We Dance? (2004 film)|談情共舞]]' | tests/parse.js --wt2html --prefix zhwiki | sed -e 's/mw:ExtLink/mw:WikiLink/g' | tests/parse.js --html2wt
[[:en:Shall We Dance? (2004 film)|談情共舞]]

...but look at the HTML:

$ echo '[[:en:Shall We Dance? (2004 film)|談情共舞]]' | tests/parse.js --wt2html --normalize=parsoid --prefix dewiki
<p><a rel="mw:ExtLink" href="//en.wikipedia.org/wiki/Shall We Dance? (2004 film)" title="en:Shall We Dance? (2004 film)">談情共舞</a></p>

That question mark in the href ought to be URI-encoded, otherwise everything after it is treated as a query string.
Given properly-escaped HTML, the WTS seems to be fine:

$ echo '<p><a rel="mw:ExtLink" href="//en.wikipedia.org/wiki/Shall We Dance%3F (2004 film)" title="en:Shall We Dance? (2004 film)">談情共舞</a></p>' | tests/parse.js --html2wt --prefix dewiki
[[:en:Shall We Dance? (2004 film)|談情共舞]]

Event Timeline

cscott created this task.Apr 8 2015, 8:20 PM
cscott updated the task description. (Show Details)
cscott raised the priority of this task from to Normal.
cscott added a project: Parsoid.
cscott added a subscriber: cscott.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 8 2015, 8:20 PM

Change 204864 had a related patch set uploaded (by Cscott):
T95473: interwiki links fail to round-trip

https://gerrit.wikimedia.org/r/204864

cscott renamed this task from [[:en:Shall We Dance? (2004 film)|談情共舞]] fails to round-trip on zhwiki to Interwiki links whose titles contain question marks fail to WTS.Apr 17 2015, 8:40 PM
cscott set Security to None.
cscott renamed this task from Interwiki links whose titles contain question marks fail to WTS to Interwiki links whose titles contain question marks are not properly escaped in HTML output..Apr 17 2015, 8:46 PM
cscott updated the task description. (Show Details)

Change 223384 had a related patch set uploaded (by Arlolra):
WIP: Accept entities in extlink href

https://gerrit.wikimedia.org/r/223384

Change 204864 abandoned by Arlolra:
WIP: T95473: interwiki links containing ? fail to round-trip

Reason:
Tests were stolen in https://gerrit.wikimedia.org/r/#/c/223384/ in patchset 3.

https://gerrit.wikimedia.org/r/204864

Change 223384 abandoned by Arlolra:
WIP: Accept entities in extlink href

Reason:
For now ... until I pick it up again.

https://gerrit.wikimedia.org/r/223384

cscott added a subscriber: ssastry.Jul 1 2016, 2:59 PM

@ssastry -- I wonder if your recent interwiki changes fix this bug?

Change 223384 restored by Arlolra:
WIP: Accept entities in extlink href

https://gerrit.wikimedia.org/r/223384

Restricted Application added a subscriber: Cosine02. · View Herald TranscriptJan 6 2017, 8:46 PM

Change 204864 restored by Arlolra:
WIP: T95473: interwiki links containing ? fail to round-trip

https://gerrit.wikimedia.org/r/204864

See also T182649: Handle interwiki links in media `link=` params which is mildly related (that is, when galleries support interwiki links we should double-check that they also properly escape question marks in interwiki links).