Page MenuHomePhabricator

Parsoid doesn't support {{!}} in links
Open, LowPublic

Description

[https://example.com/a{{!}}b 4]

parses to

<a rel="mw:ExtLink" class="external text" href="https://example.com/ab" id="mwBQ">4</a>

The pipe has been lost. This is likely because the PEG tokenizer doesn't support {{!}} in link context.

Event Timeline

Restricted Application added a project: Growth-Team. · View Herald TranscriptJul 16 2018, 3:49 PM
Restricted Application added subscribers: revi, Aklapper. · View Herald Transcript
Trizek-WMF renamed this task from {{!}} is not working on Structures discussions to {{!}} is not working on Structured discussions pages.Jul 16 2018, 3:49 PM

The rendered link looks differently

https://ko.wikipedia.org/w/index.php?title=%EC%82%AC%EC%9A%A9%EC%9E%90%ED%86%A0%EB%A1%A0:Trizek_(WMF)&oldid=21789735

<a class="external text" href="https://stats.wikimedia.org/v2/#/en.wikipedia.org/reading/page-views-by-country/normal%7Cmap%7C2-Year~2016060100~2018071100%7C~total">Wikistats2 maps</a>

and on Structured discussion page:

<a rel="mw:ExtLink" class="external" href="https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normalmap2-Year~2016060100~2018071100~total" data-parsoid="{&quot;targetOff&quot;:1289,&quot;contentOffsets&quot;:[1289,1306],&quot;a&quot;:{&quot;href&quot;:&quot;https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normalmap2-Year~2016060100~2018071100~total&quot;},&quot;sa&quot;:{&quot;href&quot;:&quot;https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal{{!}}map{{!}}2-Year~2016060100~2018071100{{!}}~total&quot;},&quot;dsr&quot;:[1149,1307,140,1]}"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Wikistats2 cards</font></font></a>

@Etonkovidova Is that not a parsing issue?

@Trizek-WMF @Etonkovidova I'm not sure this is specific to Flow. It looks like this is because Parsoid (via Visual Editor) is removing {{!}} and replacing it with an empty string. Flow is passing the wikitext (without modification) to Parsoid and getting back a response where {{!}} has been replaced with nothing. You can verify it's a VE issue by creating a sandbox page, starting in Source view, pasting in your wikitext, then switching back to visual editing – you'll see the link has {{!}} converted to null.

I'm not super familiar with Parsoid but reading through some of the test cases, it does not seem that {{!}} is meant to equal | when going from wikitext to HTML. @Trizek-WMF from the link you posted it looks like it's meant to be used in the context of a template argument or table cell.

{{!}} is also used as a way to add links that would broke, because of the vertical bar.

It is not even possible to create a link using vertical bars on VE. :/

  1. edit a page using VE
  2. take a link that has |, like https://stats.wikimedia.org/v2/#/fr.wikipedia.org/contributing/new-registered-users/normal|bar|2-Year~2016060100~2018071700|~total
  3. add a sentence to the page
  4. select that sentence and add the link
    1. result: the link is displayed, clickable and functional
  5. review your changes
    1. result: the link is displayed, clickable and functional
  6. publish your changes
    1. result: the link is not included
Trizek-WMF renamed this task from {{!}} is not working on Structured discussions pages to {{!}} is not working on Parsoid.Jul 17 2018, 10:55 AM

Using @Trizek-WMF's steps above, the link looks fine if you're looking at the visual diff, but if you look at the wikitext diff, then it's not there.

ssastry triaged this task as High priority.Jul 17 2018, 2:39 PM

@Trizek-WMF How badly blocked are you on this bug .. i.e. is this something you want fixed this week? Can it wait till next week?

@ssastry I'm looking into it today but if you want to take it on, please let me know.

@ssastry I'm looking into it today but if you want to take it on, please let me know.

Are you looking at fixing this in Parsoid? Or, do you mean investigating this in Flow?

@ssastry I'm looking into it today but if you want to take it on, please let me know.

Feel free to investigate this, and if you need something from the Parsoid end, let us know.

@Trizek-WMF Try to escape the | character using its percent-encoding %7C instead, like this:

[https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal%7Cmap%7C2-Year~2016060100~2018071100%7C~total cartes Wikistats2]

This seems to work around the rendering bug in Parsoid where pipes disappear from the URL. (Although trying to create a link with this URL in VE is also broken, but if it already exists in wikitext, it renders fine.)

I think the problem is not with {{!}} specifically, but with any way to put the pipe character | into a URL that maps to an interwiki. Parsoid tries to convert it into an internal link like this (the below is actually also valid, it seems to work for me):

[[stats:v2/#/fr.wikipedia.org/reading/page-views-by-country/normal%7Cmap%7C2-Year~2016060100~2018071100%7C~total|cartes Wikistats2]]

…and when the | is not escaped as %7C, like here, the syntax doesn't work and apparently Parsoid handles that very poorly.

@Trizek-WMF How badly blocked are you on this bug .. i.e. is this something you want fixed this week? Can it wait till next week?

I had to deliver the newsletter on time so I've done it. I didn't had @matmarex's alternative for the vertical bar, so the link in issues delivered on Structured Discussions is broken.

Can it wait? That's a good question. Anyone may want to add a link using vertical bars in any SD discussion.

I'm seeing Parsoid handle the | character without issue, it appears to be a problem with VE inserting the content. As I'm stepping through ve.dm.SourceSurfaceFragment.prototype.insertContent, lines contains [https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal|map|2-Year~2016060100~2018071100|~total test].

Some test cases, tested with https://en.wikipedia.org/api/rest_v1/#/:

wikitext→HTML transform (the original bug, causes broken links in Flow discussions)

Input:

[https://example.com/ 1]
[https://example.com/a|b 2]
[https://example.com/a%7Cb 3]
[https://example.com/a{{!}}b 4]
[https://example.com/#a|b 5]
[https://example.com/#a%7Cb 6]
[https://example.com/#a{{!}}b 7]
[https://stats.wikimedia.org/v2 8]
[https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal|map|2-Year~2016060100~2018071100|~total 9]
[https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal%7Cmap%7C2-Year~2016060100~2018071100%7C~total 10]
[https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal{{!}}map{{!}}2-Year~2016060100~2018071100{{!}}~total 11]

Output:

<p id="mwAQ"><a rel="mw:ExtLink" class="external text" href="https://example.com/" id="mwAg">1</a>
<a rel="mw:ExtLink" class="external text" href="https://example.com/a%7Cb" id="mwAw">2</a>
<a rel="mw:ExtLink" class="external text" href="https://example.com/a%7Cb" id="mwBA">3</a>
<a rel="mw:ExtLink" class="external text" href="https://example.com/ab" id="mwBQ">4</a>
<a rel="mw:ExtLink" class="external text" href="https://example.com/#a%7Cb" id="mwBg">5</a>
<a rel="mw:ExtLink" class="external text" href="https://example.com/#a%7Cb" id="mwBw">6</a>
<a rel="mw:ExtLink" class="external text" href="https://example.com/#ab" id="mwCA">7</a>
<a rel="mw:ExtLink" class="external text" href="https://stats.wikimedia.org/v2" id="mwCQ">8</a>
<a rel="mw:ExtLink" class="external text" href="https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal%7Cmap%7C2-Year~2016060100~2018071100%7C~total" id="mwCg">9</a>
<a rel="mw:ExtLink" class="external text" href="https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal%7Cmap%7C2-Year~2016060100~2018071100%7C~total" id="mwCw">10</a>
<a rel="mw:ExtLink" class="external text" href="https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normalmap2-Year~2016060100~2018071100~total" id="mwDA">11</a></p>

The pipe character disappears in links 4, 7 and 11 (only those using {{!}}). So I guess the workaround is to not use it in external links, since it's not actually necessary.

HTML→wikitext transform (the followup bug, causes links inserted in VisualEditor to disappear)

Input:

<a href="https://example.com/">1</a>
<a href="https://example.com/a|b">2</a>
<a href="https://example.com/a%7Cb">3</a>
<a href="https://example.com/#a|b">5</a>
<a href="https://example.com/#a%7Cb">6</a>
<a href="https://stats.wikimedia.org/v2">8</a>
<a href="https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal|map|2-Year~2016060100~2018071100|~total">9</a>
<a href="https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal%7Cmap%7C2-Year~2016060100~2018071100%7C~total">10</a>

Output:

[https://example.com/ 1]
[https://example.com/a|b 2]
[https://example.com/a%7Cb 3]
[https://example.com/#a|b 5]
[https://example.com/#a%7Cb 6]
[[stats:v2|8]]
9
10

Only links 9 and 10 are broken (the links disappears). Judging by what happens to the other links, I think the problem here only happens because Parsoid tries to convert it to an internal interwiki links, and can't do that because of the pipe characters.

So on my development Parsoid (some configuration info is not the same as production), I cannot reproduce this bug:

/usr/local/bin/node /Users/sbailey/parsing/parsoid/bin/parse.js --wt2html --inputfile /Users/sbailey/parsing/wt
<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/"><head prefix="mwr: http://en.wikipedia.org/wiki/Special:Redirect/"><meta charset="utf-8"/><meta property="mw:pageNamespace" content="0"/><meta property="isMainPage" content="true"/><meta property="mw:html:version" content="1.7.0"/><link rel="dc:isVersionOf" href="en.wikipedia.org/wiki/Main%20Page"/><title></title><base href="en.wikipedia.org/wiki/"/><link rel="stylesheet" href="en.wikipedia.org/w/load.php?modules=mediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.content.parsoid%7Cmediawiki.skinning.interface%7Cskins.vector.styles%7Csite.styles%7Cext.cite.style%7Cext.cite.styles%7Cmediawiki.page.gallery.styles&amp;only=styles&amp;skin=vector"/><!--[if lt IE 9]><script src="en.wikipedia.org/w/load.php?modules=html5shiv&amp;only=scripts&amp;skin=vector&amp;sync=1"></script><script>html5.addElements('figure-inline');</script><![endif]--><meta http-equiv="content-language" content="en"/><meta http-equiv="vary" content="Accept"/></head><body data-parsoid='{"dsr":[0,1200,0,0]}' lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr"><p data-parsoid='{"dsr":[0,655,0,0]}'><a rel="mw:ExtLink" class="external text" href="https://example.com/" data-parsoid='{"targetOff":22,"contentOffsets":[22,23],"dsr":[0,24,22,1]}'>1</a>
<a rel="mw:ExtLink" class="external text" href="https://example.com/a%7Cb" data-parsoid='{"targetOff":50,"contentOffsets":[50,51],"a":{"href":"https://example.com/a%7Cb"},"sa":{"href":"https://example.com/a|b"},"dsr":[25,52,25,1]}'>2</a>
<a rel="mw:ExtLink" class="external text" href="https://example.com/a%7Cb" data-parsoid='{"targetOff":80,"contentOffsets":[80,81],"dsr":[53,82,27,1]}'>3</a>
<a rel="mw:ExtLink" class="external text" href="https://example.com/ab" data-parsoid='{"targetOff":112,"contentOffsets":[112,113],"a":{"href":"https://example.com/ab"},"sa":{"href":"https://example.com/a{{!}}b"},"dsr":[83,114,29,1]}'>4</a>
<a rel="mw:ExtLink" class="external text" href="https://example.com/#a%7Cb" data-parsoid='{"targetOff":141,"contentOffsets":[141,142],"a":{"href":"https://example.com/#a%7Cb"},"sa":{"href":"https://example.com/#a|b"},"dsr":[115,143,26,1]}'>5</a>
<a rel="mw:ExtLink" class="external text" href="https://example.com/#a%7Cb" data-parsoid='{"targetOff":172,"contentOffsets":[172,173],"dsr":[144,174,28,1]}'>6</a>
<a rel="mw:ExtLink" class="external text" href="https://example.com/#ab" data-parsoid='{"targetOff":205,"contentOffsets":[205,206],"a":{"href":"https://example.com/#ab"},"sa":{"href":"https://example.com/#a{{!}}b"},"dsr":[175,207,30,1]}'>7</a>
<a rel="mw:ExtLink" class="external text" href="https://stats.wikimedia.org/v2" data-parsoid='{"targetOff":240,"contentOffsets":[240,241],"dsr":[208,242,32,1]}'>8</a>
<a rel="mw:ExtLink" class="external text" href="https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal%7Cmap%7C2-Year~2016060100~2018071100%7C~total" data-parsoid='{"targetOff":371,"contentOffsets":[371,372],"a":{"href":"https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal%7Cmap%7C2-Year~2016060100~2018071100%7C~total"},"sa":{"href":"https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal|map|2-Year~2016060100~2018071100|~total"},"dsr":[243,373,128,1]}'>9</a>
<a rel="mw:ExtLink" class="external text" href="https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal%7Cmap%7C2-Year~2016060100~2018071100%7C~total" data-parsoid='{"targetOff":508,"contentOffsets":[508,510],"dsr":[374,511,134,1]}'>10</a>
<a rel="mw:ExtLink" class="external text" href="https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normalmap2-Year~2016060100~2018071100~total" data-parsoid='{"targetOff":652,"contentOffsets":[652,654],"a":{"href":"https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normalmap2-Year~2016060100~2018071100~total"},"sa":{"href":"https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal{{!}}map{{!}}2-Year~2016060100~2018071100{{!}}~total"},"dsr":[512,655,140,1]}'>11</a></p>

<p data-parsoid='{"dsr":[657,1199,0,0]}'>&lt;a href="https://example.com/">1&lt;/a>
&lt;a href="https://example.com/a|b">2&lt;/a>
&lt;a href="https://example.com/a%7Cb">3&lt;/a>
&lt;a href="https://example.com/#a|b">5&lt;/a>
&lt;a href="https://example.com/#a%7Cb">6&lt;/a>
&lt;a href="https://stats.wikimedia.org/v2">8&lt;/a>
&lt;a href="https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal|map|2-Year~2016060100~2018071100|~total">9&lt;/a>
&lt;a href="https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal%7Cmap%7C2-Year~2016060100~2018071100%7C~total">10&lt;/a></p>
</body></html>
Process finished with exit code 0

kostajh added a comment.EditedJul 17 2018, 7:04 PM

For the follow up bug that @matmarex noted, I'm seeing:

vagrant@vagrant:/vagrant$ node srv/parsoid/bin/parse.js --html2wt --inputfile failing.html
[error/html2wt/link][enwiki/Main Page] Bad title text <a href="https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal|map|2-Year~2016060100~2018071100|~total">9</a>
Stack:
  /vagrant/srv/parsoid/lib/html2wt/LinkHandler.js:594:13
  next (native)
  tryCatchNext (/vagrant/srv/parsoid/node_modules/prfun/lib/index.js:783:28)
  continuer (/vagrant/srv/parsoid/node_modules/prfun/lib/index.js:800:24)
  callback (/vagrant/srv/parsoid/node_modules/prfun/lib/index.js:812:43)
  /vagrant/srv/parsoid/node_modules/prfun/lib/index.js:814:9
  tryCatch2 (/vagrant/srv/parsoid/node_modules/babybird/lib/promise.js:48:12)
  PrFunPromise.Promise (/vagrant/srv/parsoid/node_modules/babybird/lib/promise.js:458:15)
  new PrFunPromise (/vagrant/srv/parsoid/node_modules/prfun/lib/index.js:100:21)
  /vagrant/srv/parsoid/node_modules/prfun/lib/index.js:797:21
  WikitextSerializer.<anonymous> (/vagrant/srv/parsoid/lib/html2wt/LinkHandler.js:716:18)
  next (native)
9

in srv/parsoid/lib/html2wt/LinkHandler.js

var escapeLinkTarget = function(linkTarget, state) {
	// Entity-escape the content.
	linkTarget = Util.escapeEntities(linkTarget);
	return {
		linkTarget: linkTarget,
		// Is this an invalid link?
		invalidLink: !state.env.isValidLinkTarget(linkTarget) || /\|/.test(linkTarget),
	};
};

We're specifying that the link with a | character is invalid.

Here's the relevant commit.

kostajh removed kostajh as the assignee of this task.Jul 17 2018, 7:30 PM

Unassigned myself in case someone else wants to look at a fix for this.

kostajh moved this task from Current Sprint to External on the Growth-Team board.Jul 18 2018, 4:46 PM
kostajh edited projects, added Growth-Team; removed Growth-Team (Current Sprint).

So on my development Parsoid (some configuration info is not the same as production), I cannot reproduce this bug:

<a rel="mw:ExtLink" class="external text" href="https://example.com/ab" data-parsoid='{"targetOff":112,"contentOffsets":[112,113],"a":{"href":"https://example.com/ab"},"sa":{"href":"https://example.com/a{{!}}b"},"dsr":[83,114,29,1]}'>4</a>

The href is missing the | char .. it should have been https://example.com/a|b .. Same with 7 and 11 below.

<a rel="mw:ExtLink" class="external text" href="https://example.com/#ab" data-parsoid='{"targetOff":205,"contentOffsets":[205,206],"a":{"href":"https://example.com/#ab"},"sa":{"href":"https://example.com/#a{{!}}b"},"dsr":[175,207,30,1]}'>7</a>
<a rel="mw:ExtLink" class="external text" href="https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normalmap2-Year~2016060100~2018071100~total" data-parsoid='{"targetOff":652,"contentOffsets":[652,654],"a":{"href":"https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normalmap2-Year~2016060100~2018071100~total"},"sa":{"href":"https://stats.wikimedia.org/v2/#/fr.wikipedia.org/reading/page-views-by-country/normal{{!}}map{{!}}2-Year~2016060100~2018071100{{!}}~total"},"dsr":[512,655,140,1]}'>11</a></p>

Anyway, I think our PEG tokenizer doesn't recognize the {{!}} magic world in links. But, as @matmarex notes, there is no reason to use this in links. So, fixing this is a bit lower priority.

I am going to file a separate bug for the html -> wt part to not mix things up with the wt -> html parse bug which is unrelated to this.

ssastry renamed this task from {{!}} is not working on Parsoid to Parsoid doesn't support {{!}} in links.Jul 18 2018, 4:54 PM
ssastry lowered the priority of this task from High to Medium.
ssastry lowered the priority of this task from Medium to Low.Jul 18 2018, 5:02 PM
ssastry updated the task description. (Show Details)
LGoto moved this task from Needs Triage to Backlog on the Parsoid board.Feb 15 2020, 12:06 AM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptFeb 15 2020, 12:06 AM
LGoto moved this task from Backlog to Needs Investigation on the Parsoid board.May 28 2020, 6:20 PM
LGoto moved this task from Needs Investigation to Known Differences on the Parsoid board.