Page MenuHomePhabricator

The newline added to a template, magic word, variable, or parser function that returns line-start wikicode formatting (*#:; {|) causes unexpected parsing
Open, LowPublic

Description

Author: wiki.warx

Description:
assume there is tamplate named color with content #002255 (normal color definition) if its transcluded this way:

<span style="color:{{color}}"></span>

It works perfectly - it gives

<span style="color: #002255;">test</span>

But if you use it in a table (or anywhere else not inside tag attribute) it crashes:

{| style="color:{{color}};"
|-
| test
|-
|}

gives:

<table>
<ol><li>002255;"
</li></ol>

<tr>
<td> test
</td></tr>
</table>

same:

<p>test {{color}} test</p>

gives:

<p>test 

<ol><li>002255 test</p>
</li></ol>

This has even broken tag nesting!!!


Version: unspecified
Severity: major
URL: http://test.wikipedia.org/wiki/Newline_through_parser_functions
See Also: T25674

Details

Reference
bz12974

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ssastry added a comment.EditedJul 9 2015, 7:37 PM

I think the problem here is that we need different (contextual) behavior for different templates.

For templates that are expected to be used in attribute context (as in the {{color}} example), or in templates that are expected to be used in phrasing content (http://www.w3.org/TR/2011/WD-html5-20110525/content-models.html#phrasing-content-0 ), the add-a-newline behavior breaks expectations.

For everything else, newline addition seems the right behavior.

I think this goes back to various discussions that we've had in the Parsoid context -- about how to enforce content model requirements on template output. This has several benefits beyond the resolution of the conflicting requirements in this task (better visual editablity, improved parsing performance, ability to edit popular templates without causing load spikes on the parse cluster, and improved ability to reason about templates and their usage -- as demonstrated in this bug report).

One of the ideas that came up in our conversations (that @tstarling had, I think) was to rely on some form of magic words that provide content model hints (potentially enforceable rules in the parser) at the start of template wikitext. This idea came up in the context of ideas / solutions to enforce DOM-scoping on template output (https://www.mediawiki.org/wiki/User:SSastry_%28WMF%29/Notes#DOM-scoping_of_template_output, https://www.mediawiki.org/wiki/Parsoid/domparse ) without having to introduce new markup in wikitext.

So, I think, if we pursued this idea further of introducing content-model (and other contextual-use) hints to the parser in templates, we might be able to have our cake and eat it too.

I do want to add the caveat here that I haven't fully thought through all the implications, but I am putting this out as an early idea for consideration.

cscott added a comment.Jul 9 2015, 7:42 PM

It may be that the inAttribute serializer flag added by https://gerrit.wikimedia.org/r/219869 might help in some situations. For example, @Ciencia_Al_Poder's example with the inadvertent definition list might be suppressed if we are inAttribute. Not a 100% solution, but it might help.

Ireas removed a subscriber: Ireas.Aug 20 2015, 9:18 PM

At this point, this is a "feature", not a "bug". Changing it would break many many many pages on wikipedia.

Changing it would break many many many pages on wikipedia.

{{citation needed}}

Last time anyone was actually going to do anything here, it stalled out on someone figuring out how many templates actually depend on this behavior. Did you actually do that to determine "many many many", or are you just guessing?

If you look at the code history, "fixes" for this were attempted several times, and each time had to be reverted. No one has attempted to clean up WP since, so it's reasonable to assume that the number of uses has only increased.

Further, labelling this as a "bug" is somewhat problematic, since it was added deliberately on January 3, 2008 by @tstarling to fix T2529: Templates inside of tables appear incorrectly (with the comment, "Bug 529: if the template begins with a table or block-level element, it should be treated as beginning a new line. This behaviour is somewhat controversial.") If it were reverted, "templates inside tables" would be incorrect again. And even at the time (2008) this was a fix for a *regression* -- so this has been the behavior of templates for even longer than that.

This has been a part of wikitext syntax for over 7 years now, and there are a number of parser tests explicitly verifying this behavior. It is not a *bug*. It allows you to use wikitext inside a template which requires a "start of line" context: specifically, * # : ; {|.

On the other hand, it's reasonable to wonder if we could do things more gracefully. Alternative template mechanisms (for instance, T114445: [RFC] Balanced templates) may choose to opt-out of this behavior, if there are good alternatives.

Another "solution" would be to start a large-scale wikilint project aimed at cleaning up cases where this feature is used, so it could then be turned off. The difficulty with that approach (as my 2015 wikimania talk pointed out) is that it's not clear what the "proper" replacement should be in all cases.

Further, labelling this as a "bug" is somewhat problematic, since it was added deliberately on January 3, 2008 by @tstarling to fix T2529: Templates inside of tables appear incorrectly (with the comment, "Bug 529: if the template begins with a table or block-level element, it should be treated as beginning a new line. This behaviour is somewhat controversial.")

Tim added that comment in http://mediawiki.org/wiki/Special:Code/MediaWiki/27667 (the original parser rewrite), the commit you linked just moves it. But the code was there earlier, added in 2004 by Wil Mahan (http://mediawiki.org/wiki/Special:Code/MediaWiki/5479).

I don't think that this stops being a bug just because it's eleven-years-old code, though. If you look at T2529, all it asked for was more robust syntax for nested tables, and instead we got this :(

If you look at the code history, "fixes" for this were attempted several times, and each time had to be reverted.

I see only one attempted fix. And as is already noted in T14974#173936, that was reverted because trying to deploy 1.17 was causing major issues and extraneous changes would just confuse the situation, not because of any known problem with the attempted fix itself.

Further, labelling this as a "bug" is somewhat problematic, since it was added deliberately on January 3, 2008 by @tstarling to fix T2529: Templates inside of tables appear incorrectly (with the comment, "Bug 529: if the template begins with a table or block-level element, it should be treated as beginning a new line. This behaviour is somewhat controversial.")

Incorrect. It was added in r5479. The revision you link is fixing a regression that was probably introduced in r12925, where the if block that r29205 is moving the code in question out of was added.

This has been a part of wikitext syntax for over 7 years now, and there are a number of parser tests explicitly verifying this behavior. It is not a *bug*.

As I've said elsewhere in a different context to the same bogus argument, just because someone can write a test proving that a behavior exists doesn't make that behavior not a bug.

It allows you to use wikitext inside a template which requires a "start of line" context: specifically, * # : ; {|.

Or you could just put the template that needs a "start of line" context at the start of a line.

I'm not sure what we're arguing about here. You're right, this behavior seems to have been added on 25 September 2004 in r5479. So it's over a decade old now.

I'm all for starting a wikiproject to identify and replace templates which depend on this behavior. I'm just saying that you can't "fix this bug" without fixing the content first.

It allows you to use wikitext inside a template which requires a "start of line" context: specifically, * # : ; {|.

Or you could just put the template that needs a "start of line" context at the start of a line.

That would probably require also a <nowiki /> before the newline and the content of the parameter. Surprisingly, this <nowiki/> is often needed when using row syntax inside {{#if: }} constructs (with {{!}}), which makes the current situation very inconsistent :S

As discussed in T14974#1442616 I think we can mitigate the problem being discussed here by relying on the balanced-templates RFC. So, if a template is meant to be used in block context, forcing a newline break (current behavior) seems right. But, if a template is marked to be used in inline context, not forcing a newline break seems the right thing to do. I think this will provide the most benefit as requested in this bug report with the least breakage. This could reduce the impact of the change, but yes, we would still need to verify how page rendering changes before turning it on.

It allows you to use wikitext inside a template which requires a "start of line" context: specifically, * # : ; {|.

Or you could just put the template that needs a "start of line" context at the start of a line.

That would probably require also a <nowiki /> before the newline and the content of the parameter. Surprisingly, this <nowiki/> is often needed when using row syntax inside {{#if: }} constructs (with {{!}}), which makes the current situation very inconsistent :S

The fact that {{#if: }} consumes whitespace is a design problem of {{#if: }}. It doesn't make sense to burden the entire template substitution mechanism with unspecified whitespace insertions just to work around that limitation.

I'm just saying that you can't "fix this bug" without fixing the content first.

Very true. And to fix the content, we need to have some way of tracking which pages are affected. This is where this bug has been stalled, as far as I'm aware.

Meno25 removed a subscriber: Meno25.Feb 22 2016, 6:19 PM

As the latest person to get bitten by this bug, can I press for some progress. If the category suggested by Tim in six years ago, and Mr Stradivarius one year ago can be implemented, then we can at least get some idea of the magnitude of the misfeature use. Conversely if they can't let us find another way forward.

I don't think it is unreasonable to expect a mature piece of software to "just work", and while I have sympathy for grand plans involving document object models, if we can't display something beginning with *, :, ; # we need to resolve that issue sooner rather than later. All the time we are wasting volunteers efforts, and discouraging them.

Lost a day for this bug, please fix it (and thanks to Umherirrender for the suggestion to use the html entity).

I also just lost a day to this bug. It's too bad the fix was reverted in 2011. At this point there are too many different workarounds to be able to track or catalog them. I have to disagree with cscott's assertion that this isn't a bug, however. Although it fixes some cases, it breaks many other cases, and causes very unexpected behavior, like T164121. I have no idea what a realistic solution is, but I worry that we're still going to be dealing with this bug ten years from now unless we come up with some creative ideas.

I'm kind of of the opinion that, in the long term, the lesser evil of breaking current workarounds is acceptable for getting this bug fixed - after all, had the fix not been reverted in 2011, this bug would by now be a distant dream nightmare.

For future forward progress, independent parsing of templates (which is effectively what the current parsing behaviour is) is a feature we want to preserve, but given this bug report, it makes sense to introduce a solution for templates to specify contexts they are meant to be used in.

So, I propose to revive the idea in T14974#1800006 i.e. if a template is meant to be used in a non-start-of-line context, then, the template could add a magic word indicating that. In that case, when transcluded, the newline will not be added. This lets us fix this by letting templates specify intended behavior which preserves all the benefits of independent template parsing without breaking templates and content that rely on this behavior. Instead of waiting for the full balanced templates proposal to take shape, we can start with this limited version of the idea that only applies to this scenario. I suspect this would be a relatively easy fix to make.

So, a magic word like __INLINE_TPL__ or something more appropriate would be added to the template source code. Alternatively, this could be made part of a TemplateData annotation. But, both have the same intended effect of identifying the use-site context for these annotated templates.

We'll investigate this idea, but if any of you see any gotchas, please chime in.

At this time, compared to a couple years back, we have a few different tools in our toolbox to test and verify the proposed changes:
(a) Visual Diff Testing to let us compare rendering between two versions of code.
(b) ParserMigration extension that can be leveraged to let editors preview output of a page between two versions of code (production vs. proposed change). Right now, this is being used to compare Tidy vs. a Tidy-replacement. It is unclear whether we can leverage it for more than one thing at the same time, but mentioning it here.
(c) Linter extension that can be leveraged to identify use cases and wikitext patterns that need to be migrated over. We are still experimenting and gaining experience with this, but so far, it seems like it is viable.

So, TL:DR; we can probably test the above idea I proposed up there via visual diff testing, roll it out as a proposed change, and if at all necessary, leverage the linter extension to identify any templates / wikitext patterns that need to be migrated over.

@ssastry: This seems like a great idea (although I'm also worried if there are any gotchas). This let's the editors solve the problem (for the most part), which is probably the only way any solution is going to successfully address the long-tail of use cases and implementations.

I'm not a huge fan of adding a new magic word (__INLINE_TPL__) to fix this special case -- if you know enough to know to use the magic word, then you probably already know of the other possible ways to fix the problem (<nowiki/>, [[Template:colon]], etc). Adding "one more workaround" isn't actually very helpful. It's just one more thing we'll have to support forever and/or migrate out of all our content eventually. And it's not going to solve the "mysterious behavior confuses editors" issue, because __INLINE_TPL__ will be just as mysterious as the thing it is "fixing".

I'm a bigger fan of {{#balance}} because it's an opt-in to what we want the future semantics of wikitext to be, rather than an ad hoc workaround to this specific issue. Further, most of the code necessary to make that work is already in our code base. The hard part is testing and deployment... which is always going to be the hard part of any solution.

kaldari added a comment.EditedApr 30 2017, 9:34 PM

@ssastry: It looks like the __INLINE_TPL__ fix wouldn't actually fix all the use cases. See T164121, for example. In this case, the template isn't an inline template, but the extra linebreak caused by transcluding a table within a subtemplate breaks the formatting. I'm not 100% sure if that's the same bug as this one though. Can you confirm?

Nnvu removed a subscriber: Nnvu.Apr 30 2017, 11:26 PM

Alternatively, this could be made part of a TemplateData annotation.

We'd need to move TemplateData into core or add a low-level hook to support it injecting that flag.

@ssastry: It looks like the __INLINE_TPL__ fix wouldn't actually fix all the use cases. See T164121, for example. In this case, the template isn't an inline template, but the extra linebreak caused by transcluding a table within a subtemplate breaks the formatting. I'm not 100% sure if that's the same bug as this one though. Can you confirm?

It looks like T164121 is basically a duplicate of T18700, which it seems was never reopened when rSVN86072 was reverted by rSVN96887. In short, {{italictitle}} produces no output, but the newline after remains. It's effectively the same as if you left an empty line at the top of the wikitext. Then T18700 results in the nested transclusion injecting another newline, resulting in the blank line you see.

A correct fix for T18700 might be to add a "lineStart" flag to Preprocessor::preprocessToObj() so it can set the "start of a line" flag correctly for a transclusion at the start of $text instead of setting it unconditionally (which was probably the cause of T32384). Then pass $piece['lineStart'] from Parser->braceSubstitution() to Parser->getTemplateDom() to Parser->preprocessToDom() to Parser->preprocessToObj().

I'm not a huge fan of adding a new magic word (__INLINE_TPL__) to fix this special case -- if you know enough to know to use the magic word, then you probably already know of the other possible ways to fix the problem (<nowiki/>, [[Template:colon]], etc). Adding "one more workaround" isn't actually very helpful. It's just one more thing we'll have to support forever and/or migrate out of all our content eventually. And it's not going to solve the "mysterious behavior confuses editors" issue, because __INLINE_TPL__ will be just as mysterious as the thing it is "fixing".
I'm a bigger fan of {{#balance}} because it's an opt-in to what we want the future semantics of wikitext to be, rather than an ad hoc workaround to this specific issue. Further, most of the code necessary to make that work is already in our code base. The hard part is testing and deployment... which is always going to be the hard part of any solution.

@cscott, I did mean the usage of a {{#balance:inline}} or some marker. In my mind, I end up going back to magic words because that is what we started off with during our offsite discussion before we settled on the parser function as a potentially better annotation. Anyway, that syntactic detail apart, my broader observation was: rather than wait for a full implementation of that RFC with details, we can provide an implementation for one of its properties (non-start-of-line parse behavior) right now. So, to clarify, I am suggesting implementation of partial semantics which is part of the full semantics for later on.

But, your broader observation holds:

if you know enough to know to use the magic word, then you probably already know of the other possible ways to fix the problem (<nowiki/>, [[Template:colon]], etc).

My only comment there is that it provides a uniform solution for all those workarounds while nudging templates towards balanced output semantics. But yes, I don't have a better response to that than that.

@ssastry: It looks like the __INLINE_TPL__ fix wouldn't actually fix all the use cases. See T164121, for example. In this case, the template isn't an inline template, but the extra linebreak caused by transcluding a table within a subtemplate breaks the formatting. I'm not 100% sure if that's the same bug as this one though. Can you confirm?

@kaldari, I don't know right now based on some quick checks, it does look like an instance of this bug. FWIW, Parsoid doesn't seem to have that behavior, and I don't quite know why just yet. Longer term, balanced template semantics will prevent some of these weird edge cases that arise out of string-concatenation semantics of templating. But, yes, neither of those answers helps you right now. I suppose that bug can be solved independently by suppressing empty lines at the start/end of an article which of course smacks of a one-off hack. It makes sense to explore this separately on that bug.

Whatever solution you try, make sure that parser functions like {{#if:}} can avoid injecting the newline when necessary.

cscott added a comment.EditedMay 2 2017, 3:49 PM

Proposed quick fix: add a parser warning and potentially as special category when a template starts with * # : ; {|. This would point to a description of the issue, and some workarounds, and at least prevent folks from "wasting a day" chasing a mysterious behavior.

As @ssastry notes, I think this is actually "desired behavior" -- templates shouldn't depend so sensitively on their context, and probably "parse as if the template is in start-of-line context" is a reasonable default in most cases. If you don't want that behavior, you (or VE!) can add <nowiki/> to kill the start-of-line context.

This leaves some corner cases, like the theoretical port template which expands :80 and is used like [http://example.com{{port}}] Adding <nowiki/> there might not actually have the desired behavior. But [EDIT: as a stopgap] this can be rewritten as {{url-with-port|http://example.com}} which expands to [{{{1}}}:80]. In addition to no longer putting the colon in start-of-line context, this also now generates a complete DOM tree, which allows for more efficient updates after edit, makes editing in VE possible, etc.

It may be that the template context may also eventually include an "attribute string" or "list of attributes" mode (like Spacebars uses), which would allow safe transclusion inside attributes (while still preventing expansions like :80" class="foo. So eventually we may be able to mark a template with something like {{#balance:attribute}} (T114445) to opt-in to this behavior as another workaround.

But step 1 is (IMO) to add an explicit warning and/or category when the template is using start-of-line context to ensure that editors aren't surprised and baffled, linking to good documentation of the issue and workarounds since it's "not a bug". Then we'll add explicit opt-outs later in the context of a more general patch that lets templates specify their intended use context (either {{#balance}} or one of the other similar proposals the parsing team has batted around).

Comments/objections/clarifications/etc?

cscott added a comment.May 2 2017, 3:56 PM

Clarification from IRC:
(11:52:23 AM) Subbu: cscott, actually, I disagree (which is waht I was saying before I got cut out) that templates cannot generate strings like a port number or a CSS value ... and that they ought to be rewritting to always be a DOM.
(11:52:46 AM) Subbu: I think templats returning string, k-v pairs, or k/v values, or a DOM forest are all acceptable.
(11:52:51 AM) cscott-free: i think i put that in there?
(11:53:35 AM) Subbu: maybe i overinterpreted your {{url-with-port|http://example.com}} ex then?
(11:53:40 AM) cscott-free: the point is that eventually the template will specify (or infer) its desired use context, either inline DOM, block DOM, table context, or "attribute" (string/kv-pair/etc)
(11:53:55 AM) cscott-free: with "some way" of explicitly opting in to a specific context if inference doesnt' magically do it
(11:54:04 AM) Subbu: ok, agreed on that.
(11:54:28 AM) cscott-free: *in the meantime* you can rewrite as {{url-with-port...}} to get around the current behavior, which is "start of line always"
(11:54:55 AM) cscott-free: which isn't actually well-formed semantically, because "start of line" doesn't correspond to any of the inline/block DOM, key-value pair, etc contexts
(11:55:33 AM) cscott-free: but {{url-with-port...}}, although perhaps not ideal, is forward-compatible in the sense that if you use that workaround it will eventually map to one of the contexts we definitely will support
(11:55:39 AM) Subbu: right.
(11:55:39 AM) cscott-free: and in the meantime it will make VE happy
(11:55:46 AM) Subbu: ok, i overinterpreted your remark there.
(11:56:31 AM) Subbu: "in the meantime" would be a good qualification to add / edit in that comment. :)

As @ssastry notes, I think this is actually "desired behavior" -- templates shouldn't depend so sensitively on their context, and probably "parse as if the template is in start-of-line context" is a reasonable default in most cases. If you don't want that behavior, you (or VE!) can add <nowiki/> to kill the start-of-line context.

IMHO As wikicode depend sensitively on context, IMHO string generated from templates should comport exactly as the same string inserted manually.

Od1n added a subscriber: Od1n.EditedDec 26 2017, 3:19 AM

Yes, please do something about this issue, even (for the time being) just adding a magic word to configure behaviour of the template, would be helpful.

And another problematic case:
[[{{parent template}}]]  ( {{parent template}} giving ":Category:Foobar" )

This won't work either:
[[{{parent template}}]]  ( {{parent template}} giving "<nowiki />:Category:Foobar" )
Aklapper lowered the priority of this task from High to Low.Dec 26 2017, 10:27 AM

Correcting priority as this has been 'high priority' for more than five years which is unrealistic.

Od1n added a comment.Dec 27 2017, 3:26 AM

I would suggest "normal" priority. This issue gives template editors many headaches, and often results in template codes much more complex than necessary. Thus, let's not forget this issue even further than now.

We're going to make this a child task of {{#balance}} for now, because when we add a warning people are going to want to know what to do to suppress the warning -- and the thing we're going to want people to do is opt in to balancing (to get their desired start-of-line context).

We're going to make this a child task of {{#balance}} for now, because when we add a warning people are going to want to know what to do to suppress the warning -- and the thing we're going to want people to do is opt in to balancing (to get their desired start-of-line context).

What is {{#balance}}?

Od1n added a comment.Feb 27 2018, 9:47 PM

If template returns #ccc and is used as style="color:{{template}}", {{#balance:inline}} would break the template, as it wraps the output with <span>...</span>.

If template returns #ccc and is used as style="color:{{template}}", {{#balance:inline}} would break the template, as it wraps the output with <span>...</span>.

Understood. That is just an initial list of types in that proposal. Types could also be string, css, number, etc. for example. The wrapping can then be sensitive to the type and use-context.

...or just "attribute string" (output is html-escaped to be safe to include in attributes, no bare quotes allowed) or "attribute list" (validated to be valid attribute name=value pairs). I'm not convinced we should necessarily be validating number formats, css rules, etc. (Inspired by the set of template types available in spacebars, which impressed me as a good minimal-but-safe-and-structured template engine).

But yeah.

Izno added a subscriber: Izno.Aug 23 2018, 12:25 PM

Why would we keep a template like this when the probably appropriate way to deal with it is TemplateStyles?

Od1n removed a subscriber: Od1n.Aug 23 2018, 10:58 PM
valerio.bozzolan added a comment.EditedAug 25 2018, 10:38 AM

@Izno Correct me if I'm wrong but the TemplateStyle is a recent feature able to handle a CSS set of rules that can be shared between all template calls, but that can't be used to declare on runtime a CSS rule.

Dvorapa added a subscriber: Dvorapa.May 5 2019, 9:57 AM