Page MenuHomePhabricator

Broken wikilinks can be parsed as wikilinks after preprocessing
Open, NormalPublic

Description

Wikitext such as [[Foo{{echo]]}} get parsed as broken links by the
preprocessor, but then end up being parsed as valid wikilinks later.
This exposes the intermediate representation of the preprocessor
and should be avoided.

Similarly, given [[Foo|{{echo|Bar]]x}}y]]z:

  1. Both PHP and Parsoid ignore the ]] inside the echo in the "preprocessor" stage. The {{echo extends until the x}}, and the outer [[Foo extends until the y]]
  2. (a) But then the PHP preprocessor emits [[Foo|Bar]]xy]]z as an intermediate result (after template expansion), and link processing happens on this intermediate result, which moves the wikilink boundary leftward to [[Foo|Bar]]
  3. (b) Parsoid works in a single step, so it's going to keep the wikilink as extending to the y]]
  4. (a) Then PHP does linktrail processing which slurps up the trailing xy inside the link.
  5. (b) Parsoid will do linktrail processing to slurp up the trailing z inside the link.

This is "correct" behavior. Parsoid's basic worldview is that the ]] inside the template shouldn't be allowed to leak out to affect the surrounding wikilink. PHP may match Parsoid (in the future) if you use {{#balance}} (T114445). But we could also fix the preprocessor so that the "broken" ]] is escaped during preprocessing so that it can't leak out and close a wikilink in later processing.

Event Timeline

cscott created this task.Aug 2 2017, 5:53 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 2 2017, 5:53 PM

Change 352179 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] WIP: Protect broken wikilinks from being parsed as wikilinks later.

https://gerrit.wikimedia.org/r/352179

cscott added a comment.Aug 2 2017, 6:03 PM

This issue also shows up in some interactions between language converter, templates, and wikilinks (T54661: Preprocessor/Parser irregularities with -{...}- variant constructs.).

For example in the pre-edit wikitext here on zhwiki the following wikitext appeared:

[[:ja:踊る大捜査線 THE MOVIE|{{lang|ja|踊る大捜査線 THE MOVIE]]}})

Note that the ]] and }} were deliberately misnested. That's because the {{lang}} template by default emitted a [[Category:...]] tag at the end. So this became [[:ja:踊る大捜査線 THE MOVIE|..stuff..[[Category:foo]] after template insertion. The misnesting snuck an unescaped ]] into ..stuff.. to close the wikilnk before the [[Category]] wikilink started.

The correct solution would have been to use the nocat option to the {{lang}} template, or else to put the entire wikilink inside the {{lang}} template. The latter wasn't done because historically we couldn't embed | safely inside language converter markup. That should be fixed now (T146305, T146304), so we should stop this deliberate misnesting.

Arlolra triaged this task as Normal priority.Aug 2 2017, 8:21 PM
ssastry moved this task from Backlog to Links on the Parsoid board.Sep 18 2017, 5:11 PM