Page MenuHomePhabricator

  should terminate a free external link
Open, NormalPublic

Description

The EXT_LINK_URL_CLASS regexp in Parser.php allows the Zs unicode class to delimit autolinked URLs. This includes all unicode "separator, space" character, including non-breaking space (aka \u00A0).

However, it is very common to represent non-breaking space in   in wikitext, as it is hard to type the unicode character directly. But   doesn't delimit a URL:

http://cscott.net/ this is my website

parses as the url http://cscott.net/ this.

Event Timeline

cscott created this task.Dec 18 2014, 10:34 PM
cscott raised the priority of this task from to Needs Triage.
cscott updated the task description. (Show Details)
cscott added a project: MediaWiki-Parser.
cscott changed Security from none to None.
cscott added a subscriber: cscott.

Change 180982 had a related patch set uploaded (by Cscott):
Terminate free external link on &nbsp; (and numeric versions of <>)

https://gerrit.wikimedia.org/r/180982

Patch-For-Review

Aklapper triaged this task as Normal priority.Dec 19 2014, 4:02 PM
cscott added a comment.Jan 6 2015, 6:55 PM

My concern here is mostly relating to VE. I was worried that VE would generate &nbsp; in the wikitext if the user manually inserted a non-breaking space, which would then require addition of <nowiki/> to separate it from the URL. If it generates \u00A0 instead, then it looks better (no <nowiki/>) but someone editing in "source mode" can't tell that it's a non-breaking space at all (not so good).

So I think it's better if VE generates &nbsp; and we make the treatment of &nbsp; and \u00A0 consistent in the core parser (and in parsoid).

Change 240568 had a related patch set uploaded (by Cscott):
Terminate autolinks on &nbsp; and numeric entity encodings of <>

https://gerrit.wikimedia.org/r/240568

Change 180982 merged by jenkins-bot:
Terminate free external link on &nbsp; (and numeric versions of <>)

https://gerrit.wikimedia.org/r/180982

Change 240568 merged by jenkins-bot:
Terminate autolinks on &nbsp; and numeric entity encodings of <>

https://gerrit.wikimedia.org/r/240568