VisualEditor/Parsoid unable to parse template containing external link with double brackets
Closed, ResolvedPublic1 Story Points

Description

See https://wikitech.wikimedia.org/w/index.php?title=Tool:XTools&oldid=1764197

With this revision, in the {{Tool}} template at the top there is the link:

[[https://phabricator.wikimedia.org/source/tool-xtools-rebirth/ |rXTR]]

Which should be:

[https://phabricator.wikimedia.org/source/tool-xtools-rebirth/ rXTR]

This error apparently confuses VE, and you can see when you attempt to edit that it can't parse it. After saving VE adds <nowiki>'s which break the template altogether https://wikitech.wikimedia.org/w/index.php?title=Tool:XTools&diff=1764186&oldid=1764133

E.g. try editing this version, before the invalid link was added, and all is fine.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 11 2017, 4:33 PM
Deskana added a subscriber: Deskana.

Primarily a Parsoid problem.

Looking at this briefly,

! url

at https://github.com/wikimedia/parsoid/blob/master/lib/wt2html/pegTokenizer.pegjs#L1262 is probably wrong. It's not a valid reason to push preproc broken, the preprocessor doesn't know about that.

The fix looks similar to https://github.com/wikimedia/parsoid/commit/0890c9ba61f0e108231bd6d54e759c7f3d9cf303, however everything is complicated by needing to reparse for extlinks.

A hacky workaround of trying to pop when you get an extlink followed by a "]" at,
https://github.com/wikimedia/parsoid/blob/master/lib/wt2html/pegTokenizer.pegjs#L1258
is going to get messy because of suppressing pipes in different context (extlink in wikilink context).

Some simple examples,

{{1x|[[http://hi.com |ho]]}}

vs

{{1x|[http://hi.com |ho]}}

@cscott Wanna sanity check this analysis?

Arlolra claimed this task.Jul 12 2017, 11:21 PM
Arlolra triaged this task as Normal priority.

Change 365064 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] [WIP] T170289: Accept url in wikilink target position while tokenizing

https://gerrit.wikimedia.org/r/365064

ssastry moved this task from Backlog to In Progress on the Parsoid board.Jul 19 2017, 9:43 PM
cscott added a comment.Aug 2 2017, 7:33 PM

@cscott Wanna sanity check this analysis?

In the patch @Arlolra sez:

The argument I'm making is that the preprocessor would see these as wikilinks, so it's not correct to bail on encountering a url. What's more, a pipe needs to be treated as in wikilink context (see the test), which complicates trying to work around this in the tokenizer.

I think this is fundamentally correct.
https://github.com/wikimedia/mediawiki/blob/master/includes/parser/Preprocessor.php#L46 only recognizes [[, not [, and as far as I can tell there's nothing in https://github.com/wikimedia/mediawiki/blob/master/includes/parser/Preprocessor_DOM.php which treats [[ ... ] specially. From the preprocessor's perspective, this is a broken wikilink and it emits [[ (but see T172306 for a wrinkle).

I think the approach that handles this via the

broken_wikilink
  = &"[[" &{ return stops.push('preproc', 'broken'); }
    a:("[" (extlink / "[")) { return a; }

clause is the right one: the broken token should definitely remain on the preprocessor stack. OTOH, [[http://example.com]] is *not* a broken link; we'd need to parse that as a wikilink and convert it to an extlink and surrounding brackets at a later stage.

Restricted Application added a subscriber: Danmichaelo. · View Herald TranscriptAug 2 2017, 7:33 PM

Change 365064 merged by jenkins-bot:
[mediawiki/services/parsoid@master] T170289: Accept url in wikilink target position while tokenizing

https://gerrit.wikimedia.org/r/365064

Arlolra closed this task as Resolved.Aug 2 2017, 8:12 PM
Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptAug 2 2017, 8:12 PM
Jdforrester-WMF set the point value for this task to 1.Aug 4 2017, 6:26 PM