Page MenuHomePhabricator

Parsoid does not escape language prefixes in internal link hrefs, causing html2html round-trip failures
Open, LowPublic0 Estimated Story Points

Description

The link editor allows the user to type: fr:Frank Marino.
It will convert this into [[fr:Frank Marino]], which is a interlanguage link and not a link that will end up in the content (though for VE, it seems it does, which would be another issue). Although valid, it is unlikely to be desirable. The editing tool should thus escape this input at the very least to [[:fr:Frank Marino]] or warn or something.

Event Timeline

TheDJ created this task.Apr 12 2015, 11:11 AM
TheDJ raised the priority of this task from to Needs Triage.
TheDJ updated the task description. (Show Details)
TheDJ added a subscriber: TheDJ.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 12 2015, 11:11 AM
Jdforrester-WMF triaged this task as Low priority.Apr 29 2015, 6:18 PM
Jdforrester-WMF added a project: Parsoid.
Jdforrester-WMF renamed this task from Link editor should guard against using interlanguage links to Link editor should guard against using interlanguage (rather than interwiki) links.May 6 2015, 10:08 PM
Jdforrester-WMF added a project: Design.
Jdforrester-WMF edited a custom field.

This is a Parsoid bug:

$ echo '<a rel="mw:WikiLink" href="./de:Foo">de:Foo</a>' | node tests/parse.js --html2wt
[[de:Foo]]
$ echo '[[de:Foo]]' | node tests/parse.js
<link rel="mw:PageProp/Language" href="//de.wikipedia.org/wiki/Foo" data-parsoid='{"stx":"simple","a":{"href":"//de.wikipedia.org/wiki/Foo"},"sa":{"href":"de:Foo"},"dsr":[0,10,null,null]}'/>
Catrope renamed this task from Link editor should guard against using interlanguage (rather than interwiki) links to Parsoid does not escape language prefixes in internal link hrefs, causing html2html round-trip failures.Jul 24 2015, 1:41 AM
Gnom1 awarded a token.Nov 2 2015, 11:26 AM

I've tried to read some code to see if there's a simple fix for this. It seems that Parsoid attempts to handle this in at least two places in LinkHandler.js:

getLinkRoundTripData()
			// Ensure that we generate an interwiki link and not a language link!
			if (rtData.isInterwikiLang && !(/^:/.test(target.value))) {
				target.value = ':' + target.value;
			}
serializeAsWikiLink()
			if (linkData.isInterwikiLang && !/^[:]/.test(linkTarget) &&
				linkData.type !== 'mw:PageProp/Language') {
				// ensure interwiki links can't be confused with
				// interlanguage links.
				linkTarget = ':' + linkTarget;
			}

Clearly, neither of these does the job. But I'm not familiar with the code and can't easily see why these checks are not reached (?) and I don't want to spend too much time diving into this right now.

I actually kept digging a bit more. I blamed these lines to find where they came from.

The second has been there since forever (rGPARb8eb09f6adce: Handle local interwiki links.).

The first one is more interesting, added in 2017 (rGPAR95155bab9ea9: T136653: Handle interwiki shortcuts). That commit looks like it was supposed to resolve this exact bug, and also comes with unit tests that prove it should work. So why doesn't it?

The supported HTML syntax, copied from the unit test, is like this: <a rel='mw:WikiLink' href='./fr:Foo'>Foo</a>. VisualEditor generates slightly different HTML: <a href="./Fr%3AFoo" rel="mw:WikiLink">Foo</a>. The href attribute is percent-encoded and capitalized. I tested on https://en.wikipedia.org/api/rest_v1/#/Transforms/post_transform_html_to_wikitext and it turns out that it's the percent-encoding causing it to not work!

@ssastry @cscott Should Parsoid also support percent-encoded hrefs here?

Elitre removed a subscriber: Elitre.Aug 29 2019, 9:08 AM