Page MenuHomePhabricator

Incorrect parser output if -{{ appears in wikitext
Closed, ResolvedPublic

Description

I have this simple wikitext that used to work:
{{int:foo-{{{1}}}}}

In 1.29 this no longer works, and always produces:
Special:ExpandTemplates output:
{{int:foo-{[[:Template:1]]
Special:ExpandTemplates XML output:
<root>{{int:foo-{<template><title>1</title></template></root>
Special:ExpandTemplates HTML output:
<p>{{int:foo-{<a href="/w/i.php?title=Template:1&amp;action=edit&amp;redlink=1" class="new" title="Template:1 (page does not exist)">Template:1</a></p>

If I place that in a template and provide a value, result is the same except for XML output:
<root><template><title>Langname</title><part><name index="1"/><value>fin</value></part></template></root>

In 1.27alpha [1] I still get expected output:
<root><template><title>int:foo-<tplarg><title>1</title></tplarg></title></template></root>
&lt;foo-{{{1}}}&gt;

[1] https://sanat.csc.fi/wiki/Toiminnot:Mallineiden_laajennus

Event Timeline

The -{ part happens to look like language converter syntax. Folks have been working recently on getting it to play better with the normal syntax, perhaps something went wrong.

Perhaps obvious if you look at the examples, but the last } mysteriously disappears. It seems I can add more } at the end and they disappear as well.

As a workaround, you could probably create a page "Template:-" on your wiki, put "-" inside of it, and then write {{int:foo{{-}}{{{1}}}}} in the code to "escape" the language converter syntax.

Nikerabbit added a subscriber: greg.

It's not just {{int:}}, I had a other occurrence where there was a - followed by template, with totally broken display.

I also realized that it is probably a recent change that has not yet been deployed to Wikimedia sites. Increasing priority accordingly to avoid that from happening.

git bisect says it is 28774022769d2273be16c6c6e1cca710a1fd97ef

Nikerabbit renamed this task from Incorrect output when template parameter is embedded inside {{int:}} to Incorrect parser output if -{{ appears in wikitext.Dec 27 2016, 5:19 PM
greg raised the priority of this task from High to Unbreak Now!.

Deployment blocker -> UBN!

@cscott: This appears to be caused by a patch from you. Please take a look ASAP.

Can we simply revert ^ to get the train back on track? (cc: @cscott)

Hm, the -{ is language converter syntax, and so at first glance this is "expected" behavior, since language converter is turned on on all our wikis. There was a workaround above which protected the - character so it wouldn't be parsed as language converter markup -- why is this not acceptable?

(Honest question, trying to figure out how widespread this "pseudo-language-converter" markup actually is on our wikis.)

There is a chain of language converter-related patches on top of 28774022769d2273be16c6c6e1cca710a1fd97ef, but it may be possible to revert this patch in isolation; possibly some minor reconciliation in parsertests would be needed.

It looks like the preprocessor in general is not too smart about handling mismatching nesting constructs, and there are no explicit tests of mismatches in the parserTests associated with 28774022769d2273be16c6c6e1cca710a1fd97ef. That might be hard to remedy without incurring a lot of potential backtracking in the preprocessor, however.

Other workarounds might be to use percent-escape or ampersand-escape to protect the dash to break up the language converter markup.

You could also make the preprocessor language specific so that the -{ constructs are only protected if the current language object has a converter instance which isn't FakeConverter, which is the condition certain other parts of the wikitext parser use. Since the template in question is supposed to be language-agnostic, though, that would just mean that it would still break, just on more obscure languages. That's probably not an improvement.

@cscott: Searching enwiki shows 2,769 matches for "-{", none of which seem to be related to language converter. Is that relevant?

@cscott: Sorry to nag, but, this is blocking the train the week before devsummit/allhands and after 2 weeks of deploy freeze, the longer we wait the worse it'll be, generally, for us to roll forward. Can we please take the path of least resistance here and JFDI/JFFI? Thanks.

{{int:foo%2D{{{1}}}}} seems to be an appropriate workaround?

<!--{{foo shows up in @kaldari's search but wouldn't trigger any bug AFAIK. There are 16,569 of those (which is greater than 2,769, so one of these searches is bogus).

@greg I'll prepare a revert so we have options. Of course to a parser guy this looks more like "a long-standing wikitext bug that has finally been fixed" than "a long-standing wikitext feature which we've broken".

@cscott: Your number is probably right. I was only searching article space. Is this only an issue for wikis listed in $languagesWithVariants?

Change 330315 had a related patch set uploaded (by C. Scott Ananian):
Revert "Protect language converter markup in the preprocessor."

https://gerrit.wikimedia.org/r/330315

@greg The above patch disables this behavior with a two-line change.

We can then retry this after dev summit, probably with more warning in Tech News and the CHANGELOG, and perhaps with the FakeConverter guard discussed above (and in T153341).

@kaldari This affects -{ parsing in all wikis with language converter turned on -- which is all WMF wikis. The actual -{ markup on a page is parsed or not based on the actual page language in the main parser -- this is the FakeConverter guard which was missing from this issue in the preprocessor. So even on enwiki, we can parse -{...}- markup if the current page language is a $languagesWithVariants.

The issue is a bit subtle, since the {{int:...}} parser function which Nikerabbit is using is intended to access the user interface language, which can differ from the page language. So it's not at all clear to me that the FakeConverter guard is actually correct to use in this instance. More discussion of {{int}} which is relevant to this question is in T114640. The current behavior certainly seems preferable: the preprocessor will always protect -{ markup iff language converter is enabled (again, which is the case for all WMF wikis), and leave it to the main parser to determine the various page language/user interface language/target language/title language/etc in effect when actually doing replacement of -{...}- markup.

@greg The above patch disables this behavior with a two-line change.

We can then retry this after dev summit, probably with more warning in Tech News and the CHANGELOG, and perhaps with the FakeConverter guard discussed above (and in T153341).

Let's do that. Let's get that change merged and then we can continue the train. cc @thcipriani

Change 330323 had a related patch set uploaded (by Thcipriani):
Revert "Protect language converter markup in the preprocessor."

https://gerrit.wikimedia.org/r/330323

Change 330323 merged by jenkins-bot:
Revert "Protect language converter markup in the preprocessor."

https://gerrit.wikimedia.org/r/330323

I think we could improve this in the grammar, for example, by giving precedence to opening a template argument when -{{{ is seen. "-{{" could be resolved in the same way. Needs more testing, anyway, so I will merge the revert in master.

Note that the rule for e.g. {{{{{ is to give precedence to the rightmost opening construct (as I wrote in https://www.mediawiki.org/wiki/Preprocessor_ABNF), so -{{ would resolve to - {{ if the same principle were followed here.

Change 330315 merged by jenkins-bot:
Revert "Protect language converter markup in the preprocessor."

https://gerrit.wikimedia.org/r/330315

{{foo}, [[foo], etc. display as-is. I would expect -{{foo}} to display as -{{foo}}, not as -{{foo (where did the closing braces go!). Can't you require at least }- in the end? While this too can cause false positives, those would be fewer. You are even calling it as the -{...}- construct, so I don't quite understand why -{ alone would be allowed to do anything.

I would not call this expected – most editors are not aware of the language converter syntax.

ssastry lowered the priority of this task from Unbreak Now! to High.Jan 4 2017, 6:28 PM

@tstarling The precedence rule seems like a better/more orthogonal way to resolve this issue than trying to bring page language into the preprocessor.

@Nikerabbit the disappearing close braces are indeed mysterious. From Tim's ABNF link, it looks like we might be missing a "broken language converter" rule. I'll certainly add test cases for this case before this patch gets re-enabled.

Change 333997 had a related patch set uploaded (by C. Scott Ananian):
Protect language converter markup in the preprocessor (take 2).

https://gerrit.wikimedia.org/r/333997

Updated patch fixes the missing braces and other issues. As it turns out, I should have been paying more attention to Tim's ABNF! Using the "rightmost-opening" rule for sequences like -{{ leads to the behavior @Nikerabbit expects.

ssastry subscribed.

Since the patch that caused this has been reverted, this is no longer an issue. The newer version of this patch https://gerrit.wikimedia.org/r/#/c/333997 adds test cases to prevent regression of behaviour reported in this ticket.

Change 333997 merged by jenkins-bot:
[mediawiki/core@master] Protect language converter markup in the preprocessor (take 2).

https://gerrit.wikimedia.org/r/333997