Page MenuHomePhabricator

[RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments)
Open, LowPublic

Description

(This proposal was originally called "hygienic arguments"; easily confusable with a mostly-unrelated hygienic templates proposal. The latter are now "balanced" templates and we're now calling this proposal "heredoc arguments".)

As described in my Wikimania 2015 talk (starting at slide 31), the existing template argument syntax offers a number of traps for the unwary (for example, with the characters =, |, etc). As a result, it is difficult to easily move large blocks of text into templates.

As a result, we often have constructs such as:

{{tablestart|class="shiny"}}
| Hello || wiki = world
{{tableend}}

which pose a number of issues:

  1. There is no mechanism to ensure {{tablestart}} and {{tableend}} are properly matched
  2. Both {{tablestart}} and {{tableend}} emit unbalanced HTML, which complicates work on efficiently updating the parser cache after template changes.
  3. Due to the tag matching issues, this whole block is uneditable by Visual Editor.

If we were to try to write a {{table}} template which accepted the contents as an argument, it would have to look like:

{{table|class="shiny"|
{{!}} Hello {{!}}{{!}} wiki = world
}}

Our arguments needed to be transformed in two different ways to prevent | and = from being mangled when we shoehorned them into a template argument. (You could also use {{=}}... but not {{|}}.)

This would also create dirty diffs inside the argument if all you wanted to do was wrap existing wikitext into a template parameter.

Consider also:

{{sectionstart}}
==heading==
{{sectionend}}

You can't just use <nowiki>=</nowiki> around = characters, if you want that to work.

Heredoc arguments provide a new form of template invocation which avoids these issues.

The above examples would be written as:

{{table|class="shiny"|<<<
| Hello || wiki = world
>>>}}
{{section|<<<
==heading==
>>>}}

Named arguments (like class in this example) can be passed using name=<<<...>>>. The new Template:Table can now emit properly balanced HTML, with both <table> and </table> generated by the same template (instead of by two separate templates). Visual Editor can now edit this block as a single template invocation, invoking itself recursively to edit the template arguments as it does now.

The only special character in the argument when expressed this way is >>>. We'll support nesting <<<....>>>, since that's the common case where >>> would appear, and you could also use <nowiki>>>></nowiki>. However, we'll also provide a "tag" mechanism to ensure that *any* wikitext can be wrapped into a template argument with *zero dirty diffs inside the template argument*:

{{{wrapper|arg=123<<<
This test can contain >>> it's fine!
>>>123}}}

The tag before the <<< can be any number. (WLOG it could be an alphanumeric tag, but latin alphabetic characters can play havoc in RTL contexts; if we restrict to numeric tags we are guaranteed that RTL will look good.) It is always possible to choose a number N such that >>>N never appears in the argument (for example, N could have more digits than the argument has characters), which means it is always possible to wrap arbitrary wikitext without making any internal changes to the wikitext.

There are no special characters or odd escape rules in the argument when expressed this way. We don't need special {{!}}, {{=}}, etc escapes; we won't have dirty diffs or require careful search-and-replace when wrapping wikitext.

Another example, from @brion's talk on citations:

{{cite|id=“32412”|<<<
First person plural pronouns in Isthmus-Mecayapan Nahuat:

:''nejamēn'' ({{IPA|[nehameːn]}}) "We, but not you" (= me & them)
:''tejamēn'' ({{IPA|[tehameːn]}}) "We along with you" (= me & you & them)
>>>}}

Note that it was easy to surround the entire text covered by the citation into the {{cite}} template, since I didn't need to worry about the fact that the text included the special character =.

Visual Editor use
When escaping template parameters, Visual Editor would wrap the parameter with <<<...>>> if the input wikitext contained | or = (instead of encoding | as {{!}}, etc). If the parameter also contained >>>, it would generate N<<<...>>>N, picking some N such that >>>N doesn't appear in the input. That would ensure clean diffs when the only change was wrapping existing wikitext into a template.

We might also need to eventually add a flag to the data maintained by Extension:TemplateData to indicate that a given parameter should always be escaped with <<<, if that becomes the editors' preference for certain parameters.

More general use
In the initial implementation, <<< will be recognized as a quote character only for template arguments; that is, only immediately after = or | inside double braces. We could eventually allow <<< as a general mechanism, for example:

* a <<<
multi
line
>>> list item

We'll treat that as a separate task iff it proves interesting.

Strict start-of-line constraints
For ease of parsing (and reading) we can enforce start-of-line context on the result to avoid the T14974/T2529 hacks and make behavior consistent. There might be other restrictions that would prove useful. (More thought welcome here.)

Mailing list discussion: https://lists.wikimedia.org/pipermail/wikitech-l/2015-October/083448.html

Note: task description has been edited per parsing team meeting notes below; the original syntax proposal was {{>Foo}}...{{<Foo}}

// Note: task description further edited to adopt @Alsee's syntax proposal below, with a few tweaks.

Related Objects

Mentioned In
T196440: Provide a clearer way to distinguish between "absent" and empty/blank parameters when handling them in templates and parser functions
T204371: Replace initial colon in (hash-prefixed) parser function invocation with vertical bar
T204370: Behavior switch/magic word uniformity
T204366: Better varargs for templates
T204307: Parser Functions should support named parameters
T204283: Serializing extension tags using TemplateData
T203293: {{Row numbers}} completely fails on the Android app / VisualEditor
T198532: Add parentheses html tag to wikicode
T20231: provide a way to specify what text/statement is supported by a <ref> block.
T185695: Support an #open-tag and #close-tag parser function to allow for generation of "unbalanced" HTML and pseudo tags in templates
T176272: Decide on what to recommend for table style usecase
T30980: parser tags such as <ref>, <poem>, <timeline> etc. cannot be localized
T149667: Amazing Article Annotations
E187: RFC Meeting: triage meeting (2016-05-25, #wikimedia-office)
T114640: make Parser::getTargetLanguage aware of multilingual wikis
T119022: WikiDev 16 working area: Content format
E80: RFC Meeting (Heredoc arguments for templates)
Mentioned Here
T196440: Provide a clearer way to distinguish between "absent" and empty/blank parameters when handling them in templates and parser functions
T172306: Broken wikilinks can be parsed as wikilinks after preprocessing
T203293: {{Row numbers}} completely fails on the Android app / VisualEditor
T204283: Serializing extension tags using TemplateData
T204307: Parser Functions should support named parameters
P3179 2016-05-25 ArchCom-RFC triage meeting (#wikimedia-office)
E187: RFC Meeting: triage meeting (2016-05-25, #wikimedia-office)
E80: RFC Meeting (Heredoc arguments for templates)
T2529: Templates inside of tables appear incorrectly
T14974: The newline added to a template, magic word, variable, or parser function that returns line-start wikicode formatting (*#:; {|) causes unexpected parsing

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Syntax that splits a template across multiple blocks wrapped in curly braces (i.e. {{...}} ... {{..}}; e.g. below) is going to be a huge shock to libraries like mwparserfromhell and especially pywikibot. It would encourage this type of syntax, when we want to see less of it.

{{<<Foo|namedArg=bar|anotherNamedArg=bat}}
This first block goes in argument "1".
{{|}}
Now this goes in argument "2".
{{|}}
Now this goes in argument "3".  It's a little more natural if there are only one or two arguments,
but we could keep going if we wanted.
{{>>Foo}}

Matching {{<<Foo to {{>>Foo}} is not a simple parsing construct. Most parser libraries don't have support for that. If the Foo is optional, there is even more pain in trying to build a language parser that efficiently and correctly collects nested templates.

Instead of adding a lot of complexity by adding a new different type of argument, and new syntax for start and end of the combined templates, we can have long arguments like shown in T114432#1742563 and provide a simple enhancement to block quote the long argument.

i.e.

{{Table|1=<<<
| Hello || world
<<<}}

and for multiple

{{Table|class="shiny"|<<<
| Hello || world
<<<|<<<
Foo bar
<<<}}

but named argument should also have this capability

{{archive discussion|reason=<<<
blah with = and | and ~~~~
<<<|body=<<<
... long discussion ...
<<<}}

and nesting should work

{{archive discussion|reason=<<<
blah with = and | and ~~~~ parsed like normal wikitext
<<<|body=<<<
... long discussion ...
including

{{something|with=<<<
another blurb.
<<<}}

<<<}}

As an aside, I think it would be wise for all wikitext language proposals to include a formal grammar and a reference parser that is circulated for comment before they are actually approved for implementation. We need to avoid adding to the problems of the past, and ensure any new syntax has good grammar design.

Qgil removed a subscriber: Qgil.Feb 11 2016, 2:29 PM

@jayvdb Unfortunately, there is no formal grammar for wikitext. However, there are two reference implementations, one in PHP and one in JavaScript. The JavaScript implementation actually provides an interface very similar to mwparserfromhell (see https://doc.wikimedia.org/Parsoid/master/#!/guide/jsapi ) and handles wikitext syntax much more accurately than mwparserfromhell does.

I don't think your statement, "Matching {{<<Foo to {{>>Foo}} is not a simple parsing construct." is true. But yes, it would be helpful to get implementations of this construct for PHP, JavaScript, and mwparserfromhell to evaluate. The RfC process is very useful to gain consensus around basic parameters *before* we start writing code -- but sometimes we learn new things when we write the implementation, and that can feed back into the discussion.

RobLa-WMF mentioned this in Unknown Object (Event).May 4 2016, 7:33 PM
RobLa-WMF triaged this task as Low priority.Jun 8 2016, 7:08 PM
RobLa-WMF added a subscriber: RobLa-WMF.

Belated priority update discussed in E187: RFC Meeting: triage meeting (2016-05-25, #wikimedia-office) (see log at P3179)

Alsee added a subscriber: Alsee.EditedMar 13 2017, 10:08 PM

@jayvdb's suggestion looks like the best concepts here. I'd suggest a tweak:

{{archive discussion|result=<<<blah blah that can include = and | without trouble>>>|<<<
...long discussion, which can include...
{{something|with=<<<another blurb.>>>}}
>>>}}

I just flipped the closing >>>, and wrote it more like I'd expect to use the archive template .
Optional use of <<< >>> is a simple and nice addition to wikitext, and if the template is set up like this it seems to give you almost everything wanted here. I think you only "lose" one thing, but it's not really losing anything. The plan is to continue to allow normal templates to be used, alongside balanced templates. Balanced templates can be detected opportunistically (or indicated within the template). The only thing I see "missing" here is that we don't confuse new users with complicated new syntax, where they are expected to mysteriously and randomly write one syntax or the other to call a template. The most important criteria is to avoid making Wikis complicated and confusing for new users. The only thing people need to learn is that <<< >>> can be used any time, and it avoids problems when | or = happens to show up. That's a simple general concept.

cscott updated the task description. (Show Details)Oct 19 2017, 7:55 PM
cscott updated the task description. (Show Details)Oct 19 2017, 8:22 PM
cscott updated the task description. (Show Details)
cscott updated the task description. (Show Details)Oct 19 2017, 8:25 PM
Restricted Application added a subscriber: jeblad. · View Herald TranscriptOct 19 2017, 8:25 PM
cscott updated the task description. (Show Details)Oct 19 2017, 9:36 PM

Discussed this at Parsing team offsite. We decided to adopt @Alsee's proposal with a few tweaks; I've updated the task summary to reflect this. Further comments/critiques welcome!

Elitre renamed this task from [RFC] Heredoc arguments for templates (aka "hygenic" or "long" arguments) to [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments).Oct 20 2017, 10:15 AM
Elitre updated the task description. (Show Details)Oct 20 2017, 10:17 AM

Edge cases that come to mind:

{{foo |<<<span>>>| is "span", you need |<<<<span>>>>| for "<span>". It may prove easy to lose an angle-bracket.}}

{{foo |<<<You know someone's going to do >>> this at some point accidentally. What happens?}}

{{foo | <<<What about spaces before or after quoting-syntax in a positional parameter?>>> }}

{{foo | bar = <<<Or a named parameter?>>> }}

{{foo |
<<<Newlines too are often seen before or after the value in a positional parameter. Someone might try to format it like this.>>>
}}

{{foo | bar =
<<<And the same for a named parameter.>>>
}}

{{foo
|<<<Especially with vertical layout of multiple parameters, even if otherwise done correctly,>>>
|bar=<<<for both positional and named parameters.>>>
|<<<Sometimes a parameter even has multiple newlines after>>>

|<<<and/or comments that are intended for the following parameter>>>

<!-- (like this). -->
|<<<You can get both newlines and spaces if>>>
 |<<< the next parameter is supposed to be indented or>>>
|<<<someone doesn't trim line-ending whitespace.>>>      
|<<<And note there may be a newline at the end too, not just when another parameter follows.>>>
}}

{{foo|<<<
If there's some long-running text, and someone happens to type a "<<<" inside, does that screw up the matching of the closing quote-syntax?
>>>}}

{{foo|<<<
And what happens if someone {{screws|<<<up>>}} a template invocation?
>>>|<<<
And what happens if someone {{screws|<<up>>>}} a template invocation?
>>>}}

{{foo|<<< Don't forget to test with comments <!-- like this: >>> -->! >>>}}

I'm not saying what the correct behavior should be in any of these cases, just pointing them out as cases where behavior might be surprising.

Alsee added a comment.Oct 20 2017, 5:48 PM

I believe the proper behavior (meaning the easiest to understand and most expected behavior) is to just protect the wrapped content then behave as if the <<< and >>> don't exist. This does a good job of covering most of the listed edge cases. For example:

{{foo |<<<You know someone's going to do >>> this at some point accidentally. What happens?}}

would be identical to

{{foo |You know someone's going to do  this at some point accidentally. What happens?}}

Interesting detail: Notice that this example put two spaces between "do" and "this".

P.S. The new task description appears to ask whether multi-line list items are interesting. The answer is yes, mostly on talk pages. My first impression is that it feels odd to use <<< >>> for multi-line list items, but it might be preferable to the awkward <br> method I sometimes use.

The question of what to do about >>> that isn't immediately preceding | or }} is an interesting one. From the discussion we had at the offsite, I believe that although we want to think through how we might eventually support something like @Alsee's suggestion above (<<< ... >>> as a generic quoting construct), we'd prefer to be conservative in the first implementation: the <<< will be treated literally (ie, not as a quote character) unless it's inside a template and immediately follows a | or = with no whitespace *and* the >>> is immediately followed by a | or }}. That will let us get some experience with the new construct with the tightest possible syntax, and then we can later loosen things up to allow whitespace & allow it to be used outside template arguments, if/when we've got a better idea what we want the behavior there to be (whitespace stripping, etc).

{{foo|<<< Don't forget to test with comments <!-- like this: >>> -->! >>>}}

Embedded comments are interesting. I think the most consistent thing to do is to protect them, ie the example above is invalid (argument closes at the first >>>) while
{{echo|bar=<<< no <!-- comment stripping>>>}}
is valid (properly closed). Otherwise we'd have to invent some new special escape syntax in case we really did want to embed the literal characters <!-- in an argument, and the whole point of this syntax is that you should be able to surround literally anything with no additional escaping needed (although sometimes you need to quote using a numeric tag, of course). (When it comes time to implement this, I might regret this statement, since I believe comment stripping in the PHP parser happens quite early.)

{{foo|<<<
If there's some long-running text, and someone happens to type a "<<<" inside, does that screw up the matching of the closing quote-syntax?
>>>}}

Another interesting case. I'd expect that you'd need to use the tagged quote syntax for this to work, ie:

{{foo|1<<<
If there's some long-running text, and someone happens to type a "<<<" inside, does that screw up the matching of the closing quote-syntax?
>>>1}}

Probably this can be handled with priority: a matching tagged close-quote (>>>N where N matches something currently open) should take priority over any other open quotes and implicitly close them all, while a normal close-quote (>>>) would just close the topmost untagged open quote (if there is one). That should allow us to attain our goal of quoting *anything* with no additional escapes needed, although sometimes you have to choose an unused N and use the tagged quote syntax to make that work.

This isn't fool-proof for human editors, but the mistake should be very visible (a broken argument). VE or the improved wikitext editor could recognize this case and assign an appropriate numeric tag automatically. (Just keep incrementing N until neither of the strings N<<< or >>>N appear in the text you want to quote.)

And note that in the initial implementation the special <<<...>>> quotes would only work in template arguments, not alone in text, so in order to break quoting this example would have to be:

{{foo|<<<
If there's some long-running text, and someone happens to type a {{echo|<<< inside, does that screw up the matching of the closing quote-syntax?
>>>}}

I'm perfectly comfortable with requiring a numeric tag on the outer {{foo template if you really wanted to pass a template fragment as an argument. Thinking through the fully-general case is worthwhile if we eventually expect <<<...>>> to become a general quoting construct outside template arguments... although I'm not 100% sure we do want that.

and the whole point of this syntax is that you should be able to surround literally anything with no additional escaping needed

Doesn't that statement conflict with allowing expansion of templates inside a quoted-argument?

Independent of the edge-case discussion going on, given that effectively '<<<' and '>>>' is a quoting construct, we could potentially bikeshed on the specific syntactic choice. Other choices could be that we talked about:

  • %ESC .... ESC% or ESC% ... %ESC where ESC can be any arbitrary string that is chosen at the use-site. Of course, the % could be < or << or <<< or some other substring that is pre-determined as part of the implementation. We arrived at <<< since we figured it lets editors use the default without needing to use custom ESC strings and also makes the syntax less confusing by eliminating arbitrary variations.

Clearly, it has to be something that won't be commonly encountered in wikis and hence won't need to be escaped when we introduce this new syntax. It also needs to be RTL-friendly. The choice of a custom escape string (in the standard heredoc form) would let editors get around some of the pesky edge cases with escaping which Scott also considered above with N<<<.

So, if anyone wants to bikeshed on the syntactic choice, have at it. When it comes time to implement / prototype this, we'll pick the best option.

Tgr added a comment.Oct 27 2017, 3:32 AM

Otherwise we'd have to invent some new special escape syntax in case we really did want to embed the literal characters <!-- in an argument

You need special syntax to use <!-- in wikitext anyway, otherwise you risk major breakage if someone adds --> in same other part of the document later. That special syntax is pretty straightforward: just us &lt;!-- (unless you want to insert an actual wiki comment in multiple parts, which is heavily Dont Do That Then territory).

Alsee added a comment.Nov 1 2017, 11:02 AM

inside a template and immediately follows a | or = with no whitespace *and* the >>> is immediately followed by a | or }}

Even if you want to be conservative in the initial version, template arguments need to accept surrounding whitespace. If you don't, people will be very confused why it's broken. There's a strong expectation that we can uses spaces and newlines to format template parameters. A parameter may be surrounded by spaces, and parameters are often split on individual lines.

He7d3r added a subscriber: He7d3r.Dec 29 2017, 11:49 AM
Arlolra claimed this task.Jan 9 2018, 6:44 PM
kchapman added a subscriber: kchapman.

@cscott TechCom would like to schedule an IRC meeting in the coming weeks for this RFC

Change 418198 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/core@master] [WIP] Heredoc

https://gerrit.wikimedia.org/r/418198

and the whole point of this syntax is that you should be able to surround literally anything with no additional escaping needed

Doesn't that statement conflict with allowing expansion of templates inside a quoted-argument?

This is interesting. Presumably @Anomie is concerned with:

{{foo|<<< {{bar}} >>>}}

where {{bar}} could itself expand to >>>. I *think* that the way the Preprocessor works ensures this isn't a problem; that is, we match braces and parse arguments before any of the wikitext expansion takes place, so by the time the >>> shows up it is harmless. But it's certainly worth a test case to ensure that this works correctly.

inside a template and immediately follows a | or = with no whitespace *and* the >>> is immediately followed by a | or }}

Even if you want to be conservative in the initial version, template arguments need to accept surrounding whitespace. If you don't, people will be very confused why it's broken. There's a strong expectation that we can uses spaces and newlines to format template parameters. A parameter may be surrounded by spaces, and parameters are often split on individual lines.

This is a legit concern. The way that TemplateData formats options probably requires some whitespace here by default as well. We could probably reasonably restrict the whitespace though, so long as (something like) the inline and block formats of https://github.com/wikimedia/mediawiki-extensions-TemplateData/blob/master/Specification.md#316-format are allowed. We can grep through our wikitext archives to see how often <<< and >>> might appear in "confusing" contexts in existing wikitext.

and the whole point of this syntax is that you should be able to surround literally anything with no additional escaping needed

Doesn't that statement conflict with allowing expansion of templates inside a quoted-argument?

This is interesting. Presumably @Anomie is concerned with:

{{foo|<<< {{bar}} >>>}}

where {{bar}} could itself expand to >>>.

I don't think that's what I was concerned with, although it has been so long that I've forgotten. I may have been questioning whether "literally anything with no escaping" means that foo gets passed literal {{bar}} rather than the transclusion of Template:bar, i.e. whether braces still need manual escaping to achieve the former result.

cscott added a comment.EditedSep 14 2018, 9:41 AM

@Anomie ah, yes. Current template semantics expand the arguments before evaluating the template. I'd been thinking that would be maintained, but for general use it might be worth thinking through alternatives. For example, if you want to use heredoc syntax for parser functions (and I do: T204283, T204307) then you need to pass the parser function the raw unexpanded text. I think I can deal with *not* expanding the heredoc string, in terms of defining the semantics of the syntax, and treating "standard wikitext template expansion" as a bit of a special case (eager expansion of arguments). That is, in theory {{Foo|{{bar}}}} could be sugar for {{#expand|Template:Foo|<<<{{bar}}>>>}}, where the implementation of the #expandparser function does an explicit eager expansion of its arguments before substituting them into Template:Foo.

Current template semantics expand the arguments before evaluating the template.

That's not completely accurate, at least for the PHP parser, and I think the difference may be important.

The actual time when the argument is expanded is when it's accessed in the wikitext (via {{{1}}} or the like). At the PHP level, that's when PPFrame::getArgument() is called (directly or via getArguments()/getNumberedArguments()/getNamedArguments()).

You can see this behavior in wikitext by looking at syntax that has side effects. For example, consider enwiki's Template:Void. An invocation like {{void | {{DEFAULTSORT:Foo}} }} won't set the default sorting because {{{1}}} is never expanded, while if arguments were expanded before evaluating the template then it would do so.

For example, if you want to use heredoc syntax for parser functions (and I do: T204283, T204307) then you need to pass the parser function the raw unexpanded text.

There's nothing really stopping us (in PHP at least) from adding a version of PPFrame::getArgument() that returns the raw PPNode for the parser function's implementation to use as it sees fit, or that expands it using PPFrame::RECOVER_ORIG to (try to) return wikitext without templates and such expanded.

cscott added a comment.EditedSep 14 2018, 4:05 PM

@Anomie is the expansion memoized? Presumably if I include {{1}} twice I'm guaranteed the same contents? (I should check this myself but I'm on mobile at the moment.)

It may be that using the <<<...>>> syntax will opt you in to slightly different argument expansion semantics, if we need to make a clean up here for consistency.

@Anomie is the expansion memoized? Presumably if I include {{1}} twice in guaranteed the same contents? (I should check this myself but I'm on mobile at the moment.)

It is. Both PPTemplateFrame_DOM's and PPTemplateFrame_Hash's implementations cache the expansion so it only needs to be done once.

While thinking through the details of argument expansion, it's probably important to figure out how to pass heredoc-quoted arguments through to child templates safely. That is, if Template:Foo is:

{{SomeOtherTemplate|{{{1}}}}}

and I invoke it like:

{{Foo|<<<bar=bat>>>}}

I probably want to select two different behaviors: (a) deliberately unquoted, so SomeOtherTemplate is given the named argument bar, and (b) deliberately quoted, so SomeOtherTemplate is given a single unnamed argument with the literal value bar=bat.

I *think* behavior (a) is what I wrote above, and if I wanted behavior (b) I'd write Template:Foo like:

{{SomeOtherTemplate|<<<{{{1}}}>>>}}

...but it's worth thinking this through carefully (and writing test cases). Consider also:

{{SomeOtherTemplate|<<<<nowiki>{{{1}}}</nowiki>>>>}}

While thinking through the details of argument expansion, it's probably important to figure out how to pass heredoc-quoted arguments through to child templates safely. That is, if Template:Foo is:

{{SomeOtherTemplate|{{{1}}}}}

and I invoke it like:

{{Foo|<<<bar=bat>>>}}

I probably want to select two different behaviors: (a) deliberately unquoted, so SomeOtherTemplate is given the named argument bar, and (b) deliberately quoted, so SomeOtherTemplate is given a single unnamed argument with the literal value bar=bat.

You can already run into that situation with something like Template:= and {{Foo|bar{{=}}bat}}, or with {{Foo|{{baz}}}} where Template:Baz contains bar=bat. Behavior is not currently selectable, and I'd recommend against trying to make it so because you'll probably wind up breaking things that rely on the current behavior.

I *think* behavior (a) is what I wrote above,

Before 2008 MediaWiki did your option (a), but in 2008 @tstarling changed it to option (b). See https://meta.wikimedia.org/wiki/Migration_to_the_new_preprocessor for some details.

Consider also:

{{SomeOtherTemplate|<<<<nowiki>{{{1}}}</nowiki>>>>}}

That certainly is something else to consider. If template expansion works inside <<< >>>, then we'll likely also need <nowiki> and other parser tags to behave as the tag rather than having them produce literal text. Doing otherwise would be horribly confusing.

My thought is that "interpret argument as literal text" or "interpret argument as wikitext" is a decision to be made by the template author. It's an implicit (or with TemplateData, perhaps explicit) type annotation on the arguments as they are used. The job of the heredoc parser is just to get the raw text through to the template author (or scribunto module author, or parser function, etc) intact; then they get to decide whether to interpret it as wikitext or not.

It's comforting to know that option (b) is the intended behavior. I've seen plenty of image-related templates where {{small}} expands to something like thumb|200px and it is used like [[Foo.jpg|{{small}}]] and the | is very much intended to be interpreted *not* literally (ie, option (a)). I think it is still worth thinking through those types of use cases to ensure that the template author has full control of the interpretation of magic characters like | and = in the contents of the argument. I agree that "fully escaped literal text" is a sound default though!

My thought is that "interpret argument as literal text" or "interpret argument as wikitext" is a decision to be made by the template author.

This could be done by adding some new syntax that works like {{{foo}}} but doesn't expand the wikitext. Personally I'm skeptical that making wikitext even more confusing in that way would be a good idea, versus leaving such cases to be done via Scribunto where code can be better structured and have comments, but either way it doesn't need heredoc syntax to be able to do it.

It's an implicit (or with TemplateData, perhaps explicit) type annotation on the arguments as they are used.

Implicit as it's used, sure. TemplateData, maybe not. How would TemplateData handle annotating parameter 1 of en:Template:Demo, that currently uses <nowiki> and Scribunto to show both unexpanded and expanded versions of the parameter?

The job of the heredoc parser is just to get the raw text through to the template author (or scribunto module author, or parser function, etc) intact; then they get to decide whether to interpret it as wikitext or not.

It could be just that simple. Where it gets complex is if you try to make {{{foo}}} do one thing for {{template|foo={{bar}} <nowiki>{{baz}}</nowiki>}} and another for {{template|foo=<<<{{bar}} <nowiki>{{baz}}</nowiki>>>>}}.

It's comforting to know that option (b) is the intended behavior. I've seen plenty of image-related templates where {{small}} expands to something like thumb|200px and it is used like [[Foo.jpg|{{small}}]] and the | is very much intended to be interpreted *not* literally (ie, option (a)).

I note the image syntax may not be following the same rules. :/

I think it is still worth thinking through those types of use cases to ensure that the template author has full control of the interpretation of magic characters like | and = in the contents of the argument. I agree that "fully escaped literal text" is a sound default though!

Let's not. Making it so that {{foo|bar{{=}}baz}} and various other methods of doing the same sort of thing might be interpreted as parameter 1 being "bar=baz" or parameter bar being "baz" based on how Template:Foo is coded seems like a surefire recipe for confusion.

I think the heredoc syntax as disabling the usual interpretation of | and = to the parsing of the wikitext string is probably as far as it should go. I don't think preventing expansions of wikitext is likely to be generally useful, and we already have <nowiki> for that when it's needed.

There should be no way for the template/parser function/Scribunto module to know whether the wikitext calling it used heredoc syntax or not.[1]

Access to the unexpanded wikitext of an argument, if we want to provide that, should be a separate feature completely unrelated to whether heredoc syntax was used. At the PHP level that would likely be a new method on PPFrame or a parameter to PPFrame::getArgument(); whether and how to expose it in Scribunto or wikitext could be figured out elsewhere.

[1]: Beyond guessing based on the presence of | in the value. Although that could have come from a Scribunto frame::expandTemplate() call.

@Anomie I generally agree, but:

I note the image syntax may not be following the same rules. :/

My eventual goal is to allow the use of something like {{#media|Foo.jpg|...}} if you need heredoc quoting. Given that, it's worth trying to figure out whether or not it would be possible to write a template like {{small}} using that syntax. Of course I'm persuadable if you think the answer is that you should write it as {{small|Foo.jpg}} and that should expand to {{#media|{{{1}}}|thumb|200px}} avoiding the whole issue of trying to expand an argument to more than one option.

I'll note that other template systems solve this problem with some sort of "varargs" type syntax, where you explicitly say that {{{1}}} is a "list of arguments". Something like {{{...1...}}} (just to make up syntax). Ideally we'd have a proper key-value map to back that with, not just a string...

I think the heredoc syntax as disabling the usual interpretation of | and = to the parsing of the wikitext string is probably as far as it should go. I don't think preventing expansions of wikitext is likely to be generally useful, and we already have <nowiki> for that when it's needed.
There should be no way for the template/parser function/Scribunto module to know whether the wikitext calling it used heredoc syntax or not.[1]

Mostly agree, in that it shouldn't be exposed as part of the macro insertion API or something like that. But on the other hand, we've mooted the idea of using the opt-in syntax to also opt-in to "better" template expansion semantics (where different people have different definitions of "better"). I'm not opposed to that in principle. We should use the opportunity we have to allow people to opt-in.

Access to the unexpanded wikitext of an argument, if we want to provide that, should be a separate feature completely unrelated to whether heredoc syntax was used. At the PHP level that would likely be a new method on PPFrame or a parameter to PPFrame::getArgument(); whether and how to expose it in Scribunto or wikitext could be figured out elsewhere.

See T203293: {{Row numbers}} completely fails on the Android app / VisualEditor. We very much want to avoid hacks like that. If we can do that by switching to lazy evaluation of arguments (see the above point about opt-in changes) than I think we should do so. I really don't want to expose the strip state!

But on the other hand, we've mooted the idea of using the opt-in syntax to also opt-in to "better" template expansion semantics (where different people have different definitions of "better"). I'm not opposed to that in principle. We should use the opportunity we have to allow people to opt-in.

Let's not confuse a new syntax for invoking templates with opting in to some new way of writing templates. Let's not even confuse a new way of passing an argument to a template with opting in to some unrelated feature of template expansion.

See T203293: {{Row numbers}} completely fails on the Android app / VisualEditor. We very much want to avoid hacks like that. If we can do that by switching to lazy evaluation of arguments (see the above point about opt-in changes) than I think we should do so. I really don't want to expose the strip state!

We already have lazy evaluation of arguments, I've said that at least three times now. What we don't have is a way for anything (wikitext, parser functions, or Scribunto) to get the value without evaluating it, and that's what T203293 wants.

Ok, we're (almost) totally agreed then. :)

(I think our only difference is that I would *like* to leverage heredoc syntax to opt you in to balanced expansion of the template at the use-site -- but that's not going to happen for path-dependence reasons, so that makes us totally agreed I think.)

cscott added a comment.EditedNov 2 2018, 8:31 PM

Copied from a discussion at https://gerrit.wikimedia.org/r/#/c/mediawiki/services/parsoid/+/467531/6/lib/wt2html/tt/LinkHandler.js@1014 wrt how [[File:Foo.jpg|{{sometemplate}}]] gets parsed:

In PHP-land, the preprocessor does the [[...]] matching and the |-splitting before any other tokenization gets done... but then we do template expansion and the parser *re-does* the [[...]] matching and |-splitting afterward, allowing a second chance for templates to generate brackets (always bad) and vertical bars (a feature, I guess). (See T172306: Broken wikilinks can be parsed as wikilinks after preprocessing)
Parsoid is sort of doing the same: iff templates are involved, we expand them and then do a hacky reparse to give a second chance for the template to generate vertical bars.
Both of these have weird corner cases, because we're treading the fine line between literal text and wikitext tokens.
If you use the heredoc syntax for files:

{{#file:Foo.jpg|alt=<<<http://test.com|123>>>}}

You at least have an unambiguous way to state that you don't want vertical bars treated as tokens. (Of course <nowiki> would work as well.) But that doesn't generalize if the parameter is coming from a template or a template argument; what you want is something like:

{{#file:Foo.jpg|alt=<<<{{{1}}}>>>}}

but that pushes the responsibility onto the parser function itself to do brace expansion on its argument...or not. (Compare to {{#file:Foo.jpg|alt=<nowiki>{{{1}}}</nowiki>}} which of course just gives you literal text.)
And then if you *did* want to return multiple named arguments, you'd want something like splat syntax:

{{#file:Foo.jpg|{{{...1...}}}}}

and then the template should return a proper key-value map (hopefully as a structured object, but at least as an unambiguous string, say JSON).
All of which is a really long way to say that it sucks that both PHP and Parsoid have some really unintuitive behaviors here caused by trying to reparse the string output of templates, and they both break in unexpected ways when you push on them too hard.

Heredoc arguments give you a way to pass "exactly this text" to a template/parser function, but the job isn't quite done. We might need something like the "quoted heredoc" as well which allows you to expand {{{1}}} or {{random-template}} inside it and be sure that pipe characters in the result will be escaped and won't be interpreted as argument separators.

Elitre removed a subscriber: Elitre.Nov 8 2018, 4:56 PM

See some further discussion of argument lists and splat syntax in https://phabricator.wikimedia.org/T196440#5341715

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptJul 17 2019, 5:04 PM

@cscott this RFC seems close to an agreement. Implementation is probably blocked on php-parsoid, but the new syntax and semantics could already be approved. Do you want to move this forward? TechCom could put it on last call, or schedule an IRC meeting, if one is still needed.

ssastry added a comment.EditedMon, Jul 22, 4:57 PM

@cscott this RFC seems close to an agreement. Implementation is probably blocked on php-parsoid, but the new syntax and semantics could already be approved.

Unless we want to do this in both Parsoid/PHP and the existing core parser, this is probably blocked on Parsoid/PHP becoming the default. But, maybe most of the work is in the preprocessor and some in Parsoid's peg tokenizer in which case it is still doable without waiting on that integration.

@Arlolra has some WIP patches as well so maybe he has some insight into this question.

@Arlolra has some WIP patches as well so maybe he has some insight into this question.

The question is whether it makes sense to move forward with approving the syntax and semantics, regardless of the state of implementation. And, yes, I'm in favour of that.

Thinking only of the short-term, for the moment, any consensus we could reach before Wikimania could be communicated in the various "what's next for wikitext" sort of talks we're on tap to give and/or if don't quite have consensus yet we could use Wikimania to help build it.

My last hangup on this proposal was my unease about *always* protecting | and =, since there are templates which want to splice argument lists. But the conversation around T196440 helped convince me that arg list processing is orthogonal (and likely has to do with the "result type" of the template), so I'm pretty confident that's not a blocker for heredoc arguments any longer.

I agree with @Arlolra that we can finalize the proposal syntax and semantics, even if we haven't figured out where implementation belongs on the Parsing-Team roadmap yet.

daniel moved this task from Under discussion to Inbox on the TechCom-RFC board.EditedMon, Jul 22, 6:58 PM

Dropping this into the rfc inbox for techcom review. Do I read this correctly that everyone here thinks this can go on last call?

daniel moved this task from Inbox to Last Call on the TechCom-RFC board.Fri, Jul 26, 11:08 AM

Per the TechCom meeting on July 24, this RFC goes on Last Call for being approved. If no objections remain unaddressed by August 7, the RFC will be approved as proposed and amended.

If you care about this RFC, please comment - in support, or raising concerns. The Last Call period is not just for raising objections, but also for confirming consensus.

daniel edited projects, added TechCom-RFC (TechCom-Approved); removed TechCom-RFC.EditedTue, Aug 13, 9:21 AM

This RFC has been approved as proposed per the TechCom meeting on 2019-08-07.

It is noted that implementation will likely have to wait for Parsoid-PHP to land.

Dalba added a subscriber: Dalba.