Page MenuHomePhabricator

Broken wikilinks can be parsed as wikilinks after preprocessing
Open, MediumPublic

Description

Wikitext such as [[Foo{{echo]]}} get parsed as broken links by the
preprocessor, but then end up being parsed as valid wikilinks later.
This exposes the intermediate representation of the preprocessor
and should be avoided.

Similarly, given [[Foo|{{echo|Bar]]x}}y]]z:

  1. Both PHP and Parsoid ignore the ]] inside the echo in the "preprocessor" stage. The {{echo extends until the x}}, and the outer [[Foo extends until the y]]
  2. (a) But then the PHP preprocessor emits [[Foo|Bar]]xy]]z as an intermediate result (after template expansion), and link processing happens on this intermediate result, which moves the wikilink boundary leftward to [[Foo|Bar]]
  3. (b) Parsoid works in a single step, so it's going to keep the wikilink as extending to the y]]
  4. (a) Then PHP does linktrail processing which slurps up the trailing xy inside the link.
  5. (b) Parsoid will do linktrail processing to slurp up the trailing z inside the link.

This is "correct" behavior. Parsoid's basic worldview is that the ]] inside the template shouldn't be allowed to leak out to affect the surrounding wikilink. PHP may match Parsoid (in the future) if you use {{#balance}} (T114445). But we could also fix the preprocessor so that the "broken" ]] is escaped during preprocessing so that it can't leak out and close a wikilink in later processing.

Event Timeline

Change 352179 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] WIP: Protect broken wikilinks from being parsed as wikilinks later.

https://gerrit.wikimedia.org/r/352179

This issue also shows up in some interactions between language converter, templates, and wikilinks (T54661: Preprocessor/Parser irregularities with -{...}- variant constructs.).

For example in the pre-edit wikitext here on zhwiki the following wikitext appeared:

[[:ja:踊る大捜査線 THE MOVIE|{{lang|ja|踊る大捜査線 THE MOVIE]]}})

Note that the ]] and }} were deliberately misnested. That's because the {{lang}} template by default emitted a [[Category:...]] tag at the end. So this became [[:ja:踊る大捜査線 THE MOVIE|..stuff..[[Category:foo]] after template insertion. The misnesting snuck an unescaped ]] into ..stuff.. to close the wikilnk before the [[Category]] wikilink started.

The correct solution would have been to use the nocat option to the {{lang}} template, or else to put the entire wikilink inside the {{lang}} template. The latter wasn't done because historically we couldn't embed | safely inside language converter markup. That should be fixed now (T146305, T146304), so we should stop this deliberate misnesting.

Arlolra triaged this task as Medium priority.Aug 2 2017, 8:21 PM

Change 396049 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] WIP/Concept: Handle [ as well as [[ in the preprocessor

https://gerrit.wikimedia.org/r/396049

Removing task assignee due to inactivity, as this open task has been assigned to the same person for more than two years (see the emails sent to the task assignee on Oct27 and Nov23). Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome.
(See https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator.)