Page MenuHomePhabricator

The newline added to a template, magic word, variable, or parser function that returns line-start wikicode formatting (*#:; {|) causes unexpected parsing
Open, LowPublicBUG REPORT

Assigned To
None
Authored By
bzimport
Feb 8 2008, 6:51 PM
Referenced Files
F4657: bug12974.html
Nov 21 2014, 10:03 PM
Tokens
"Burninate" token, awarded by SerDIDG."Love" token, awarded by Liuxinyu970226."Love" token, awarded by Ciencia_Al_Poder."Love" token, awarded by He7d3r.

Description

Author: wiki.warx

Description:
assume there is template named color with content #002255 (normal color definition) if its transcluded this way:

<span style="color:{{color}}"></span>

It works perfectly - it gives

<span style="color: #002255;">test</span>

But if you use it in a table (or anywhere else not inside tag attribute) it crashes:

{| style="color:{{color}};"
|-
| test
|-
|}

gives:

<table>
<ol><li>002255;"
</li></ol>

<tr>
<td> test
</td></tr>
</table>

same:

<p>test {{color}} test</p>

gives:

<p>test 

<ol><li>002255 test</p>
</li></ol>

This has even broken tag nesting!!!


Version: unspecified
Severity: major
URL: http://test.wikipedia.org/wiki/Newline_through_parser_functions
See Also: T25674

Details

Reference
bz12974

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I'm not a huge fan of adding a new magic word (__INLINE_TPL__) to fix this special case -- if you know enough to know to use the magic word, then you probably already know of the other possible ways to fix the problem (<nowiki/>, [[Template:colon]], etc). Adding "one more workaround" isn't actually very helpful. It's just one more thing we'll have to support forever and/or migrate out of all our content eventually. And it's not going to solve the "mysterious behavior confuses editors" issue, because __INLINE_TPL__ will be just as mysterious as the thing it is "fixing".

I'm a bigger fan of {{#balance}} because it's an opt-in to what we want the future semantics of wikitext to be, rather than an ad hoc workaround to this specific issue. Further, most of the code necessary to make that work is already in our code base. The hard part is testing and deployment... which is always going to be the hard part of any solution.

@cscott, I did mean the usage of a {{#balance:inline}} or some marker. In my mind, I end up going back to magic words because that is what we started off with during our offsite discussion before we settled on the parser function as a potentially better annotation. Anyway, that syntactic detail apart, my broader observation was: rather than wait for a full implementation of that RFC with details, we can provide an implementation for one of its properties (non-start-of-line parse behavior) right now. So, to clarify, I am suggesting implementation of partial semantics which is part of the full semantics for later on.

But, your broader observation holds:

if you know enough to know to use the magic word, then you probably already know of the other possible ways to fix the problem (<nowiki/>, [[Template:colon]], etc).

My only comment there is that it provides a uniform solution for all those workarounds while nudging templates towards balanced output semantics. But yes, I don't have a better response to that than that.

@ssastry: It looks like the __INLINE_TPL__ fix wouldn't actually fix all the use cases. See T164121, for example. In this case, the template isn't an inline template, but the extra linebreak caused by transcluding a table within a subtemplate breaks the formatting. I'm not 100% sure if that's the same bug as this one though. Can you confirm?

@kaldari, I don't know right now based on some quick checks, it does look like an instance of this bug. FWIW, Parsoid doesn't seem to have that behavior, and I don't quite know why just yet. Longer term, balanced template semantics will prevent some of these weird edge cases that arise out of string-concatenation semantics of templating. But, yes, neither of those answers helps you right now. I suppose that bug can be solved independently by suppressing empty lines at the start/end of an article which of course smacks of a one-off hack. It makes sense to explore this separately on that bug.

Whatever solution you try, make sure that parser functions like {{#if:}} can avoid injecting the newline when necessary.

Proposed quick fix: add a parser warning and potentially as special category when a template starts with * # : ; {|. This would point to a description of the issue, and some workarounds, and at least prevent folks from "wasting a day" chasing a mysterious behavior.

As @ssastry notes, I think this is actually "desired behavior" -- templates shouldn't depend so sensitively on their context, and probably "parse as if the template is in start-of-line context" is a reasonable default in most cases. If you don't want that behavior, you (or VE!) can add <nowiki/> to kill the start-of-line context.

This leaves some corner cases, like the theoretical port template which expands :80 and is used like [http://example.com{{port}}] Adding <nowiki/> there might not actually have the desired behavior. But [EDIT: as a stopgap] this can be rewritten as {{url-with-port|http://example.com}} which expands to [{{{1}}}:80]. In addition to no longer putting the colon in start-of-line context, this also now generates a complete DOM tree, which allows for more efficient updates after edit, makes editing in VE possible, etc.

It may be that the template context may also eventually include an "attribute string" or "list of attributes" mode (like Spacebars uses), which would allow safe transclusion inside attributes (while still preventing expansions like :80" class="foo. So eventually we may be able to mark a template with something like {{#balance:attribute}} (T114445) to opt-in to this behavior as another workaround.

But step 1 is (IMO) to add an explicit warning and/or category when the template is using start-of-line context to ensure that editors aren't surprised and baffled, linking to good documentation of the issue and workarounds since it's "not a bug". Then we'll add explicit opt-outs later in the context of a more general patch that lets templates specify their intended use context (either {{#balance}} or one of the other similar proposals the parsing team has batted around).

Comments/objections/clarifications/etc?

Clarification from IRC:
(11:52:23 AM) Subbu: cscott, actually, I disagree (which is waht I was saying before I got cut out) that templates cannot generate strings like a port number or a CSS value ... and that they ought to be rewritting to always be a DOM.
(11:52:46 AM) Subbu: I think templats returning string, k-v pairs, or k/v values, or a DOM forest are all acceptable.
(11:52:51 AM) cscott-free: i think i put that in there?
(11:53:35 AM) Subbu: maybe i overinterpreted your {{url-with-port|http://example.com}} ex then?
(11:53:40 AM) cscott-free: the point is that eventually the template will specify (or infer) its desired use context, either inline DOM, block DOM, table context, or "attribute" (string/kv-pair/etc)
(11:53:55 AM) cscott-free: with "some way" of explicitly opting in to a specific context if inference doesnt' magically do it
(11:54:04 AM) Subbu: ok, agreed on that.
(11:54:28 AM) cscott-free: *in the meantime* you can rewrite as {{url-with-port...}} to get around the current behavior, which is "start of line always"
(11:54:55 AM) cscott-free: which isn't actually well-formed semantically, because "start of line" doesn't correspond to any of the inline/block DOM, key-value pair, etc contexts
(11:55:33 AM) cscott-free: but {{url-with-port...}}, although perhaps not ideal, is forward-compatible in the sense that if you use that workaround it will eventually map to one of the contexts we definitely will support
(11:55:39 AM) Subbu: right.
(11:55:39 AM) cscott-free: and in the meantime it will make VE happy
(11:55:46 AM) Subbu: ok, i overinterpreted your remark there.
(11:56:31 AM) Subbu: "in the meantime" would be a good qualification to add / edit in that comment. :)

As @ssastry notes, I think this is actually "desired behavior" -- templates shouldn't depend so sensitively on their context, and probably "parse as if the template is in start-of-line context" is a reasonable default in most cases. If you don't want that behavior, you (or VE!) can add <nowiki/> to kill the start-of-line context.

IMHO As wikicode depend sensitively on context, IMHO string generated from templates should comport exactly as the same string inserted manually.

Yes, please do something about this issue, even (for the time being) just adding a magic word to configure behaviour of the template, would be helpful.

And another problematic case:
[[{{parent template}}]]  ( {{parent template}} giving ":Category:Foobar" )

This won't work either:
[[{{parent template}}]]  ( {{parent template}} giving "<nowiki />:Category:Foobar" )
Aklapper lowered the priority of this task from High to Low.Dec 26 2017, 10:27 AM

Correcting priority as this has been 'high priority' for more than five years which is unrealistic.

I would suggest "normal" priority. This issue gives template editors many headaches, and often results in template codes much more complex than necessary. Thus, let's not forget this issue even further than now.

We're going to make this a child task of {{#balance}} for now, because when we add a warning people are going to want to know what to do to suppress the warning -- and the thing we're going to want people to do is opt in to balancing (to get their desired start-of-line context).

We're going to make this a child task of {{#balance}} for now, because when we add a warning people are going to want to know what to do to suppress the warning -- and the thing we're going to want people to do is opt in to balancing (to get their desired start-of-line context).

What is {{#balance}}?

If template returns #ccc and is used as style="color:{{template}}", {{#balance:inline}} would break the template, as it wraps the output with <span>...</span>.

If template returns #ccc and is used as style="color:{{template}}", {{#balance:inline}} would break the template, as it wraps the output with <span>...</span>.

Understood. That is just an initial list of types in that proposal. Types could also be string, css, number, etc. for example. The wrapping can then be sensitive to the type and use-context.

...or just "attribute string" (output is html-escaped to be safe to include in attributes, no bare quotes allowed) or "attribute list" (validated to be valid attribute name=value pairs). I'm not convinced we should necessarily be validating number formats, css rules, etc. (Inspired by the set of template types available in spacebars, which impressed me as a good minimal-but-safe-and-structured template engine).

But yeah.

Why would we keep a template like this when the probably appropriate way to deal with it is TemplateStyles?

@Izno Correct me if I'm wrong but the TemplateStyle is a recent feature able to handle a CSS set of rules that can be shared between all template calls, but that can't be used to declare on runtime a CSS rule.

I think this is an issue currently discussed on c:Template_talk:Information. I recently rewrote c:Template:Information in Lua. The Lua code was trying to mimic perfectly the behavior of the old template, producing the same html for most files. We run into troubles when one of the fields to the template was another template using wiki tables ("{| ... |}"). Those rendered correctly when used in the old wikicode based template, but not in the new Lua one. The patch we used was to slap "\n" before and after every field in the template. It makes all infoboxes look strange (extra space around each field of the template), but at least all the transclusions of the template render correctly (AFAIK). Placing "<nowiki/> in front of each field did not fix the issue. One solution I am exploring is for lua to test if the content of each field contain "{|" (and possibly other strings) and only add "\n" when detected. Other proposed solution involve changes to CSS files. Help with potential solutions would be appreciated as we can not experiment very much, since the template is used on 51M file pages and we are trying to keep edits to the minimum.

Change 559542 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Test case for implicit newline insertion when expanding templates (T14974)

https://gerrit.wikimedia.org/r/559542

MAIL non presidiata
ACCOUNT non pi? in utilizzo

Aklapper changed the subtype of this task from "Task" to "Bug Report".Feb 6 2022, 7:18 PM
Aklapper removed a subscriber: wikibugs-l-list.

There's also the problem where MediaWiki still incorrectly surrounds a an <bdi></bdi> element (used as a *mixed-content* element whole content may be inline or block) by forcing its inclusion within a paragraph (within a dummy and undesired <p></p> HTML element, possibly also causing another block element containg it to be terminated too early). This causes problems for contents that should be purely inline and even totally invisible in the rendered page.

Some templates use <div style="display:none">...</div> with the assumption that it is always invisible, but then those templates cannot be used inline (e.g. they break paregraphs or list items and the list containg them, by inserting a separating paragraph, with also additional vertical margins). We should avoid "div" elements for that case, but "span" is not so versatile as its content-model is not mixed. Now if we want to use "bdi" (the only element allowed in Mediawiki to have mixed content, i.e. capable of containing inline or block elements), MediaWiki assumes that this is an inline element... and wants to make it part of a block, inserting it in a paragraph.

Examples of this are templates inserting "tags" for storing some data intended to be processed by machines for Wikidata. Shouldn't we have a Mediawiki-specific element that is to be completely invisible (it would generate a `<bdi style="display:none" data="..."></bdi> element) that the Mediawiki parser (or its last "TidyHTML" step) considers to be left as is (i.e. never embedded inside any other automatically-added element as it is warrantied to be always invisible?

Side note: a "bdi" element can occur anywhere even in the middle of a word, it does not have any effect on the directionality of surrounding text, or on line wraps. Even copy-pasting text containing a bdi element without any text content (only attributes) into a plain-text editor will not add anything, it is really invisible.

So that would be a use case for some <data someID="..."> element in MediaWiki, that would pass though the "HTML Tidy" step and converted into <bdi style="display:none" data-someID="..."></bdi> (the data would fit in the data attribute whose attribute name contains the given id, and where any backslash, newlines or double quotes in the data would be escaped in the attribute value; otherwise HTML-escaping using character entities are also possible for HTML transparency: such embedded data is supposed to be never read by humans, but only by machines, so a basic escaping in it will not be a problem for machines). That data element would accept no CSS style or class attributes, it is purely intended to be invisible, the CSS style="display:none" attribute is automatically added for the generated <bdi> element by the HTML generator in the last phase, but it could also generate some CSS class="mw-data" attribute instead).

Such thing would be useful for various purpose, including "micro-tagging" for semantics, or could be used as internal tracking metadata, or could facilititate the work of wiki editors. The "data" attributes can be used on any valid HTML elements, these attributes can have an extension chosen freely (as long as it is a valid HTML identifier) and appended after an hyphen, and they are also usable in CSS selectors if needed to perform efficent queries inside the DOM (e.g. for use in Javascript with jQuery). For embedding really-invisible data and generate valid HTML5, only the <bdi> element is valid and suitable for that purpose, and it is the only one accepted and supported (partially) by Mediawiki. But using a MediaWiki-specific <data> element would make things easier to handle in the MediaWiki parser and its HTML generator.

Such data element could also be used as a debugging tool for templates/modules, to contain some tracing info.

Note: this data element is also not equivalent to an invisible <input type="..." value="..."> element (which is visible to HTML input forms) or to a <meta> element (meant for web page headers and that are not freely insertable in the page content, but used by the HTML generator). It is a requirement that the chosen HTML element (that the data element will map to) has a "mixed" content-model, for full HTML5 conformance (and "bdi" is the only one well supported in browsers which has all the nice features for being fully invisible to human readers, especially when its inner content is empty (no child elements, only HTML attributes are permitted).

The MediaWiki <data> (or equivalently <#tag:data>) element should probably not accept any common HTML layout attributes like dir, style or class (they would have no effect at all anyway and the two last could conflict with the attributes generated by the parser when convertint the MediaWiki <data> element into an HTML5 <bdi> element). If such attributes are passed they would become data-dir, data-style, data-class. However I may see a use case for accepting the lang attribute specifically as a reserved identifier treated differently (and pass it "as is"), as a possible way to offer some functionality in selectors with jQuery that are not possible with 'data-someID' attributes.

It may eventually also accept an id attribute, only for such selectors, but it could cause conflicts with anchors used in page navigation and would make them partly "visible" to the human user, unless they are converted to another data-id attribute that is also usable in selectors for jQuery via a simple adjustment of the selector syntax.

Beside that, we could use multiple attributes in the same data element for different but simulteneous tagging purposes. E.g. <data lang="en" a="x" b="y"/> would become <bdi class="mw-data" lang="en" data-a="x" data-b="y"></bdi>.

If the MediaWiki data element has a content, that content would be remapped into the (unextended) data attribute of the generated HTML element (with proper escaping). E.g. <data lang="en" a="x" b="y">{ "text", 2 }</data> would become <bdi class="mw-data" lang="en" data-a="x" data-b="y" data="{ &quot;text&quot;, 2 }"></bdi>

So in summary, such "data" element will be a MediaWiki-safe replacement for an empty "bdi" element that will only hold invisible data. It will never be visible but will be processable by a machine (or by client-side Javascript tools that may transform hem to make them visible on demand: it will have enough attributes to perform all we want, including with selectors on all these attributes that can be used in jQuery). MediaWiki will recognize this data element easily, will neither consider them as inline or block elements, will not embed them into any undesired block elements, but will just HTML-ize them on the final conversion step into empty 'bdi" elements with converted data attributes.

@Verdy_p Can you explain in 2 sentences or less how that is relevant to this task?