Page MenuHomePhabricator

Parsoid: Serializer should wrap invalid link targets in <nowiki>
Closed, ResolvedPublic

Description

Given the following HTML:

<a rel="mw:WikiLink" href="./]] foo [[bar">Manual</a>

<a href="./]] foo [[bar">Manual</a>

I get [1]:

[[./]] foo [[bar|Manual]]

[./%5D%5D%20foo%20%5B%5Bbar Manual]

The one for external is acceptable (garbage in, garbage out), as it doesn't cause arbitrary wikitext to be passed through.

However the one for mw:WikiLink seems a bug to me. Although it is impossible to link to an invalid link in wikitext (not even with [[<nowiki>invalid </nowiki>]]), that is better than passing through arbitrary wikitext.

That way it at least roundtrips and doesn't mess up the document.

<a rel="mw:WikiLink" href="./]] foo [[bar">Manual</a>

Should serialize to something like:

[[<nowiki>]] foo [[bar</nowiki>|Manual]]

Which parses to:

<a rel="mw:WikiLink" href="./%5D%5D_foo_%5B%5Bbar" data-parsoid="{&quot;stx&quot;:&quot;piped&quot;,&quot;a&quot;:{&quot;href&quot;:&quot;./%5D%5D_foo_%5B%5Bbar&quot;},&quot;sa&quot;:{&quot;href&quot;:&quot;<nowiki>]] foo [[bar</nowiki>&quot;},&quot;dsr&quot;:[1,41,32,2]}">Manual</a>

[1] http://parsoid.wmflabs.org/_html/


Version: unspecified
Severity: normal

Details

Reference
bz52360

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:09 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz52360.

Change 135814 had a related patch set uploaded by Arlolra:
Wrap invalid link targets in <nowiki> when serializing

https://gerrit.wikimedia.org/r/135814

Quote from gwicke on the above patch,

"One issue with nowikification is that it breaks the entire link including linked content, with no easy way for a user of VE to fix it. The alternative idea we had was to replace the original link target with one pointing to an error page explaining which chars are valid in titles. The advantage would be that it'd still render as expected, and the user could fix up the target in VE.
The backlinks from the error page would also provide a handy list of pages with broken links."

Change 135814 merged by jenkins-bot:
Replace invalid link targets when serializing

https://gerrit.wikimedia.org/r/135814

We chose to point these links to MediaWiki:Badtitletext.