Page MenuHomePhabricator

Parsoid should be able to understand HTML entities in links
Closed, ResolvedPublic1 Estimated Story Points

Description

Wikitext: Breakfast and "[http:// http://www.librarieswithoutborders.org/ Librairies without borders]" presentation
Becomes in VE: Breakfast and "[http:// http://www.librarieswithoutborders.org/ Librairies without borders]" presentation as text without any link
But Mediawiki makes of it: Breakfast and "Librairies without borders" presentation linking to http://+http//www.librarieswithoutborders.org/

Found at: https://www.mediawiki.org/w/index.php?title=Wikimedia_Hackathon_2015/Program&oldid=1636908

Event Timeline

JanZerebecki raised the priority of this task from to Needs Triage.
JanZerebecki updated the task description. (Show Details)
JanZerebecki subscribed.

Change 223384 had a related patch set uploaded (by Arlolra):
T98960: Accept entities in extlink href

https://gerrit.wikimedia.org/r/223384

Arlolra triaged this task as Medium priority.Jul 7 2015, 7:43 PM
Arlolra subscribed.
Jdforrester-WMF renamed this task from VE can't handle [http:// http://...] to Parsoid should be able to understand HTML entities in links.Aug 6 2015, 11:18 PM
Jdforrester-WMF set Security to None.
Jdforrester-WMF edited a custom field.
Jdforrester-WMF moved this task from To Triage to TR1: Releases on the VisualEditor board.

Change 223384 had a related patch set uploaded (by Arlolra):
WIP: Accept entities in extlink href

https://gerrit.wikimedia.org/r/223384

This one is weird. That's a totally bogus link, right? It doesn't actually work?

We do a sanity-checking pass after template expansion on the link contents to try to ensure the result actually parses as a URL. I'm guessing embedding a space fails that check -- as well it should.

I'm calling this a bug in the PHP parser for allowing such a thing in the first place.

Change 223384 abandoned by Arlolra:
WIP: Accept entities in extlink href

Reason:
For now ... until I pick it up again.

https://gerrit.wikimedia.org/r/223384

This one is weird. That's a totally bogus link, right? It doesn't actually work?

We do a sanity-checking pass after template expansion on the link contents to try to ensure the result actually parses as a URL. I'm guessing embedding a space fails that check -- as well it should.

I'm calling this a bug in the PHP parser for allowing such a thing in the first place.

@cscott, I don't know about the original test case, but if you look at the duplicate I just created and then merged in, the link I'm using is a valid one that works properly in the read view for its intended function (linking to a prepopulated Phabricator form). Of course, it works just as well if you use percent-encoding in place of character entities, which Parsoid does fine with, but my point is that it can happen with valid links :)

Another example from https://www.mediawiki.org/w/index.php?title=Parsoid/DumpGrepper&oldid=1054779

[https://bugzilla.wikimedia.org/sho w_bug.cgi?id=43652 Bug]

I should pick this up again ...

Change 223384 restored by Arlolra:
WIP: Accept entities in extlink href

https://gerrit.wikimedia.org/r/223384

See this diff when an unrelated change changed wikilinks formatting (as reported on mediawiki.org). (Is it related? This is undesirable, especially as it's a bit more difficult to read.)

Change 223384 had a related patch set uploaded (by Arlolra):
T98960: Accept entities in extlink href

https://gerrit.wikimedia.org/r/223384

Change 223384 merged by jenkins-bot:
T98960: Accept entities in extlink href and url links

https://gerrit.wikimedia.org/r/223384

Jdforrester-WMF changed the point value for this task from 0 to 1.Feb 2 2017, 7:03 PM