Page MenuHomePhabricator

[RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments)
Open, LowPublic

Description

(This proposal was originally called "hygienic arguments"; easily confusable with a mostly-unrelated hygienic templates proposal. The latter are now "balanced" templates and we're now calling this proposal "heredoc arguments".)

As described in my Wikimania 2015 talk (starting at slide 31), the existing template argument syntax offers a number of traps for the unwary (for example, with the characters =, |, etc). As a result, it is difficult to easily move large blocks of text into templates.

As a result, we often have constructs such as:

{{tablestart|class="shiny"}}
| Hello || wiki = world
{{tableend}}

which pose a number of issues:

  1. There is no mechanism to ensure {{tablestart}} and {{tableend}} are properly matched
  2. Both {{tablestart}} and {{tableend}} emit unbalanced HTML, which complicates work on efficiently updating the parser cache after template changes.
  3. Due to the tag matching issues, this whole block is uneditable by Visual Editor.

If we were to try to write a {{table}} template which accepted the contents as an argument, it would have to look like:

{{table|class="shiny"|
{{!}} Hello {{!}}{{!}} wiki = world
}}

Our arguments needed to be transformed in two different ways to prevent | and = from being mangled when we shoehorned them into a template argument. (You could also use {{=}}... but not {{|}}.)

This would also create dirty diffs inside the argument if all you wanted to do was wrap existing wikitext into a template parameter.

Consider also:

{{sectionstart}}
==heading==
{{sectionend}}

You can't just use <nowiki>=</nowiki> around = characters, if you want that to work.

Heredoc arguments provide a new form of template invocation which avoids these issues.

The above examples would be written as:

{{table|class="shiny"|<<<
| Hello || wiki = world
>>>}}
{{section|<<<
==heading==
>>>}}

Named arguments (like class in this example) can be passed using name=<<<...>>>. The new Template:Table can now emit properly balanced HTML, with both <table> and </table> generated by the same template (instead of by two separate templates). Visual Editor can now edit this block as a single template invocation, invoking itself recursively to edit the template arguments as it does now.

The only special character sequence in the argument when expressed this way is >>>. We'll support nesting <<<....>>>, since that's the common case where >>> would appear, and you could also use <nowiki>>>></nowiki>. However, we'll also provide a "tag" mechanism to ensure that *any* wikitext can be wrapped into a template argument with *zero dirty diffs inside the template argument*:

{{{wrapper|arg=123<<<
This test can contain >>> it's fine!
>>>123}}}

The tag before the <<< can be any number. (WLOG it could be an alphanumeric tag, but latin alphabetic characters can might play havoc in RTL contexts; if we restrict to numeric tags we are guaranteed that RTL will look good.) It is always possible to choose a number N such that >>>N never appears in the argument (for example, N could have more digits than the argument has characters), which means it is always possible to wrap arbitrary wikitext without making any internal changes to the wikitext. Note that you also need to use the tag mechanism in the special case where the argument's last character is >.

There are no further special characters or odd escape rules in the argument when expressed this way. We don't need special {{!}}, {{=}}, etc escapes; we won't have dirty diffs or require careful search-and-replace when wrapping wikitext.

Another example, from @brion's talk on citations:

{{cite|id=“32412”|<<<
First person plural pronouns in Isthmus-Mecayapan Nahuat:

:''nejamēn'' ({{IPA|[nehameːn]}}) "We, but not you" (= me & them)
:''tejamēn'' ({{IPA|[tehameːn]}}) "We along with you" (= me & you & them)
>>>}}

Note that it was easy to surround the entire text covered by the citation into the {{cite}} template, since I didn't need to worry about the fact that the text included the special character =.

Visual Editor use
When escaping template parameters, Visual Editor would wrap the parameter with <<<...>>> if the input wikitext contained | or = (instead of encoding | as {{!}}, etc). If the parameter also contained >>>, it would generate N<<<...>>>N, picking some N such that >>>N doesn't appear in the input. That would ensure clean diffs when the only change was wrapping existing wikitext into a template.

We might also need to eventually add a flag to the data maintained by Extension:TemplateData to indicate that a given parameter should always be escaped with <<<, if that becomes the editors' preference for certain parameters.

More general use
In the initial implementation, <<< will be recognized as a quote character only for template arguments; that is, only immediately after = or | inside double braces. We could eventually allow <<< as a general mechanism, for example:

* <<< a
multi
line
>>> list item

We'll treat that as a separate task iff it proves interesting. (And it might: T230654.) We could allow the open angle brackets anywhere, but constraining them to appear only immediately after a syntax element (in particular a start-of-line list bullet) is probably a safe start, and allows something like T230658.

Strict start-of-line constraints
For ease of parsing (and reading) we can enforce start-of-line context on the result to avoid the T14974/T2529 hacks and make behavior consistent. There might be other restrictions that would prove useful. (More thought welcome here.)

Mailing list discussion: https://lists.wikimedia.org/pipermail/wikitech-l/2015-October/083448.html

Note: task description has been edited per parsing team meeting notes below; the original syntax proposal was {{>Foo}}...{{<Foo}}

// Note: task description further edited to adopt @Alsee's syntax proposal below, with a few tweaks.

Details

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Elitre renamed this task from [RFC] Heredoc arguments for templates (aka "hygenic" or "long" arguments) to [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments).Oct 20 2017, 10:15 AM
Elitre updated the task description. (Show Details)Oct 20 2017, 10:17 AM

Edge cases that come to mind:

{{foo |<<<span>>>| is "span", you need |<<<<span>>>>| for "<span>". It may prove easy to lose an angle-bracket.}}

{{foo |<<<You know someone's going to do >>> this at some point accidentally. What happens?}}

{{foo | <<<What about spaces before or after quoting-syntax in a positional parameter?>>> }}

{{foo | bar = <<<Or a named parameter?>>> }}

{{foo |
<<<Newlines too are often seen before or after the value in a positional parameter. Someone might try to format it like this.>>>
}}

{{foo | bar =
<<<And the same for a named parameter.>>>
}}

{{foo
|<<<Especially with vertical layout of multiple parameters, even if otherwise done correctly,>>>
|bar=<<<for both positional and named parameters.>>>
|<<<Sometimes a parameter even has multiple newlines after>>>

|<<<and/or comments that are intended for the following parameter>>>

<!-- (like this). -->
|<<<You can get both newlines and spaces if>>>
 |<<< the next parameter is supposed to be indented or>>>
|<<<someone doesn't trim line-ending whitespace.>>>      
|<<<And note there may be a newline at the end too, not just when another parameter follows.>>>
}}

{{foo|<<<
If there's some long-running text, and someone happens to type a "<<<" inside, does that screw up the matching of the closing quote-syntax?
>>>}}

{{foo|<<<
And what happens if someone {{screws|<<<up>>}} a template invocation?
>>>|<<<
And what happens if someone {{screws|<<up>>>}} a template invocation?
>>>}}

{{foo|<<< Don't forget to test with comments <!-- like this: >>> -->! >>>}}

I'm not saying what the correct behavior should be in any of these cases, just pointing them out as cases where behavior might be surprising.

Alsee added a comment.Oct 20 2017, 5:48 PM

I believe the proper behavior (meaning the easiest to understand and most expected behavior) is to just protect the wrapped content then behave as if the <<< and >>> don't exist. This does a good job of covering most of the listed edge cases. For example:

{{foo |<<<You know someone's going to do >>> this at some point accidentally. What happens?}}

would be identical to

{{foo |You know someone's going to do  this at some point accidentally. What happens?}}

Interesting detail: Notice that this example put two spaces between "do" and "this".

P.S. The new task description appears to ask whether multi-line list items are interesting. The answer is yes, mostly on talk pages. My first impression is that it feels odd to use <<< >>> for multi-line list items, but it might be preferable to the awkward <br> method I sometimes use.

The question of what to do about >>> that isn't immediately preceding | or }} is an interesting one. From the discussion we had at the offsite, I believe that although we want to think through how we might eventually support something like @Alsee's suggestion above (<<< ... >>> as a generic quoting construct), we'd prefer to be conservative in the first implementation: the <<< will be treated literally (ie, not as a quote character) unless it's inside a template and immediately follows a | or = with no whitespace *and* the >>> is immediately followed by a | or }}. That will let us get some experience with the new construct with the tightest possible syntax, and then we can later loosen things up to allow whitespace & allow it to be used outside template arguments, if/when we've got a better idea what we want the behavior there to be (whitespace stripping, etc).

{{foo|<<< Don't forget to test with comments <!-- like this: >>> -->! >>>}}

Embedded comments are interesting. I think the most consistent thing to do is to protect them, ie the example above is invalid (argument closes at the first >>>) while
{{echo|bar=<<< no <!-- comment stripping>>>}}
is valid (properly closed). Otherwise we'd have to invent some new special escape syntax in case we really did want to embed the literal characters <!-- in an argument, and the whole point of this syntax is that you should be able to surround literally anything with no additional escaping needed (although sometimes you need to quote using a numeric tag, of course). (When it comes time to implement this, I might regret this statement, since I believe comment stripping in the PHP parser happens quite early.)

{{foo|<<<
If there's some long-running text, and someone happens to type a "<<<" inside, does that screw up the matching of the closing quote-syntax?
>>>}}

Another interesting case. I'd expect that you'd need to use the tagged quote syntax for this to work, ie:

{{foo|1<<<
If there's some long-running text, and someone happens to type a "<<<" inside, does that screw up the matching of the closing quote-syntax?
>>>1}}

Probably this can be handled with priority: a matching tagged close-quote (>>>N where N matches something currently open) should take priority over any other open quotes and implicitly close them all, while a normal close-quote (>>>) would just close the topmost untagged open quote (if there is one). That should allow us to attain our goal of quoting *anything* with no additional escapes needed, although sometimes you have to choose an unused N and use the tagged quote syntax to make that work.

This isn't fool-proof for human editors, but the mistake should be very visible (a broken argument). VE or the improved wikitext editor could recognize this case and assign an appropriate numeric tag automatically. (Just keep incrementing N until neither of the strings N<<< or >>>N appear in the text you want to quote.)

And note that in the initial implementation the special <<<...>>> quotes would only work in template arguments, not alone in text, so in order to break quoting this example would have to be:

{{foo|<<<
If there's some long-running text, and someone happens to type a {{echo|<<< inside, does that screw up the matching of the closing quote-syntax?
>>>}}

I'm perfectly comfortable with requiring a numeric tag on the outer {{foo template if you really wanted to pass a template fragment as an argument. Thinking through the fully-general case is worthwhile if we eventually expect <<<...>>> to become a general quoting construct outside template arguments... although I'm not 100% sure we do want that.

and the whole point of this syntax is that you should be able to surround literally anything with no additional escaping needed

Doesn't that statement conflict with allowing expansion of templates inside a quoted-argument?

Independent of the edge-case discussion going on, given that effectively '<<<' and '>>>' is a quoting construct, we could potentially bikeshed on the specific syntactic choice. Other choices could be that we talked about:

  • %ESC .... ESC% or ESC% ... %ESC where ESC can be any arbitrary string that is chosen at the use-site. Of course, the % could be < or << or <<< or some other substring that is pre-determined as part of the implementation. We arrived at <<< since we figured it lets editors use the default without needing to use custom ESC strings and also makes the syntax less confusing by eliminating arbitrary variations.

Clearly, it has to be something that won't be commonly encountered in wikis and hence won't need to be escaped when we introduce this new syntax. It also needs to be RTL-friendly. The choice of a custom escape string (in the standard heredoc form) would let editors get around some of the pesky edge cases with escaping which Scott also considered above with N<<<.

So, if anyone wants to bikeshed on the syntactic choice, have at it. When it comes time to implement / prototype this, we'll pick the best option.

Tgr added a comment.Oct 27 2017, 3:32 AM

Otherwise we'd have to invent some new special escape syntax in case we really did want to embed the literal characters <!-- in an argument

You need special syntax to use <!-- in wikitext anyway, otherwise you risk major breakage if someone adds --> in same other part of the document later. That special syntax is pretty straightforward: just us &lt;!-- (unless you want to insert an actual wiki comment in multiple parts, which is heavily Dont Do That Then territory).

Alsee added a comment.Nov 1 2017, 11:02 AM

inside a template and immediately follows a | or = with no whitespace *and* the >>> is immediately followed by a | or }}

Even if you want to be conservative in the initial version, template arguments need to accept surrounding whitespace. If you don't, people will be very confused why it's broken. There's a strong expectation that we can uses spaces and newlines to format template parameters. A parameter may be surrounded by spaces, and parameters are often split on individual lines.

He7d3r added a subscriber: He7d3r.Dec 29 2017, 11:49 AM
Arlolra claimed this task.Jan 9 2018, 6:44 PM
kchapman moved this task from Old to Under discussion on the TechCom-RFC board.Mar 8 2018, 10:23 PM
kchapman added a subscriber: kchapman.

@cscott TechCom would like to schedule an IRC meeting in the coming weeks for this RFC

Change 418198 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/core@master] [WIP] Heredoc

https://gerrit.wikimedia.org/r/418198

and the whole point of this syntax is that you should be able to surround literally anything with no additional escaping needed

Doesn't that statement conflict with allowing expansion of templates inside a quoted-argument?

This is interesting. Presumably @Anomie is concerned with:

{{foo|<<< {{bar}} >>>}}

where {{bar}} could itself expand to >>>. I *think* that the way the Preprocessor works ensures this isn't a problem; that is, we match braces and parse arguments before any of the wikitext expansion takes place, so by the time the >>> shows up it is harmless. But it's certainly worth a test case to ensure that this works correctly.

inside a template and immediately follows a | or = with no whitespace *and* the >>> is immediately followed by a | or }}

Even if you want to be conservative in the initial version, template arguments need to accept surrounding whitespace. If you don't, people will be very confused why it's broken. There's a strong expectation that we can uses spaces and newlines to format template parameters. A parameter may be surrounded by spaces, and parameters are often split on individual lines.

This is a legit concern. The way that TemplateData formats options probably requires some whitespace here by default as well. We could probably reasonably restrict the whitespace though, so long as (something like) the inline and block formats of https://github.com/wikimedia/mediawiki-extensions-TemplateData/blob/master/Specification.md#316-format are allowed. We can grep through our wikitext archives to see how often <<< and >>> might appear in "confusing" contexts in existing wikitext.

and the whole point of this syntax is that you should be able to surround literally anything with no additional escaping needed

Doesn't that statement conflict with allowing expansion of templates inside a quoted-argument?

This is interesting. Presumably @Anomie is concerned with:

{{foo|<<< {{bar}} >>>}}

where {{bar}} could itself expand to >>>.

I don't think that's what I was concerned with, although it has been so long that I've forgotten. I may have been questioning whether "literally anything with no escaping" means that foo gets passed literal {{bar}} rather than the transclusion of Template:bar, i.e. whether braces still need manual escaping to achieve the former result.

cscott added a comment.EditedSep 14 2018, 9:41 AM

@Anomie ah, yes. Current template semantics expand the arguments before evaluating the template. I'd been thinking that would be maintained, but for general use it might be worth thinking through alternatives. For example, if you want to use heredoc syntax for parser functions (and I do: T204283, T204307) then you need to pass the parser function the raw unexpanded text. I think I can deal with *not* expanding the heredoc string, in terms of defining the semantics of the syntax, and treating "standard wikitext template expansion" as a bit of a special case (eager expansion of arguments). That is, in theory {{Foo|{{bar}}}} could be sugar for {{#expand|Template:Foo|<<<{{bar}}>>>}}, where the implementation of the #expandparser function does an explicit eager expansion of its arguments before substituting them into Template:Foo.

Current template semantics expand the arguments before evaluating the template.

That's not completely accurate, at least for the PHP parser, and I think the difference may be important.

The actual time when the argument is expanded is when it's accessed in the wikitext (via {{{1}}} or the like). At the PHP level, that's when PPFrame::getArgument() is called (directly or via getArguments()/getNumberedArguments()/getNamedArguments()).

You can see this behavior in wikitext by looking at syntax that has side effects. For example, consider enwiki's Template:Void. An invocation like {{void | {{DEFAULTSORT:Foo}} }} won't set the default sorting because {{{1}}} is never expanded, while if arguments were expanded before evaluating the template then it would do so.

For example, if you want to use heredoc syntax for parser functions (and I do: T204283, T204307) then you need to pass the parser function the raw unexpanded text.

There's nothing really stopping us (in PHP at least) from adding a version of PPFrame::getArgument() that returns the raw PPNode for the parser function's implementation to use as it sees fit, or that expands it using PPFrame::RECOVER_ORIG to (try to) return wikitext without templates and such expanded.

cscott added a comment.EditedSep 14 2018, 4:05 PM

@Anomie is the expansion memoized? Presumably if I include {{1}} twice I'm guaranteed the same contents? (I should check this myself but I'm on mobile at the moment.)

It may be that using the <<<...>>> syntax will opt you in to slightly different argument expansion semantics, if we need to make a clean up here for consistency.

@Anomie is the expansion memoized? Presumably if I include {{1}} twice in guaranteed the same contents? (I should check this myself but I'm on mobile at the moment.)

It is. Both PPTemplateFrame_DOM's and PPTemplateFrame_Hash's implementations cache the expansion so it only needs to be done once.

While thinking through the details of argument expansion, it's probably important to figure out how to pass heredoc-quoted arguments through to child templates safely. That is, if Template:Foo is:

{{SomeOtherTemplate|{{{1}}}}}

and I invoke it like:

{{Foo|<<<bar=bat>>>}}

I probably want to select two different behaviors: (a) deliberately unquoted, so SomeOtherTemplate is given the named argument bar, and (b) deliberately quoted, so SomeOtherTemplate is given a single unnamed argument with the literal value bar=bat.

I *think* behavior (a) is what I wrote above, and if I wanted behavior (b) I'd write Template:Foo like:

{{SomeOtherTemplate|<<<{{{1}}}>>>}}

...but it's worth thinking this through carefully (and writing test cases). Consider also:

{{SomeOtherTemplate|<<<<nowiki>{{{1}}}</nowiki>>>>}}

While thinking through the details of argument expansion, it's probably important to figure out how to pass heredoc-quoted arguments through to child templates safely. That is, if Template:Foo is:

{{SomeOtherTemplate|{{{1}}}}}

and I invoke it like:

{{Foo|<<<bar=bat>>>}}

I probably want to select two different behaviors: (a) deliberately unquoted, so SomeOtherTemplate is given the named argument bar, and (b) deliberately quoted, so SomeOtherTemplate is given a single unnamed argument with the literal value bar=bat.

You can already run into that situation with something like Template:= and {{Foo|bar{{=}}bat}}, or with {{Foo|{{baz}}}} where Template:Baz contains bar=bat. Behavior is not currently selectable, and I'd recommend against trying to make it so because you'll probably wind up breaking things that rely on the current behavior.

I *think* behavior (a) is what I wrote above,

Before 2008 MediaWiki did your option (a), but in 2008 @tstarling changed it to option (b). See https://meta.wikimedia.org/wiki/Migration_to_the_new_preprocessor for some details.

Consider also:

{{SomeOtherTemplate|<<<<nowiki>{{{1}}}</nowiki>>>>}}

That certainly is something else to consider. If template expansion works inside <<< >>>, then we'll likely also need <nowiki> and other parser tags to behave as the tag rather than having them produce literal text. Doing otherwise would be horribly confusing.

My thought is that "interpret argument as literal text" or "interpret argument as wikitext" is a decision to be made by the template author. It's an implicit (or with TemplateData, perhaps explicit) type annotation on the arguments as they are used. The job of the heredoc parser is just to get the raw text through to the template author (or scribunto module author, or parser function, etc) intact; then they get to decide whether to interpret it as wikitext or not.

It's comforting to know that option (b) is the intended behavior. I've seen plenty of image-related templates where {{small}} expands to something like thumb|200px and it is used like [[Foo.jpg|{{small}}]] and the | is very much intended to be interpreted *not* literally (ie, option (a)). I think it is still worth thinking through those types of use cases to ensure that the template author has full control of the interpretation of magic characters like | and = in the contents of the argument. I agree that "fully escaped literal text" is a sound default though!

My thought is that "interpret argument as literal text" or "interpret argument as wikitext" is a decision to be made by the template author.

This could be done by adding some new syntax that works like {{{foo}}} but doesn't expand the wikitext. Personally I'm skeptical that making wikitext even more confusing in that way would be a good idea, versus leaving such cases to be done via Scribunto where code can be better structured and have comments, but either way it doesn't need heredoc syntax to be able to do it.

It's an implicit (or with TemplateData, perhaps explicit) type annotation on the arguments as they are used.

Implicit as it's used, sure. TemplateData, maybe not. How would TemplateData handle annotating parameter 1 of en:Template:Demo, that currently uses <nowiki> and Scribunto to show both unexpanded and expanded versions of the parameter?

The job of the heredoc parser is just to get the raw text through to the template author (or scribunto module author, or parser function, etc) intact; then they get to decide whether to interpret it as wikitext or not.

It could be just that simple. Where it gets complex is if you try to make {{{foo}}} do one thing for {{template|foo={{bar}} <nowiki>{{baz}}</nowiki>}} and another for {{template|foo=<<<{{bar}} <nowiki>{{baz}}</nowiki>>>>}}.

It's comforting to know that option (b) is the intended behavior. I've seen plenty of image-related templates where {{small}} expands to something like thumb|200px and it is used like [[Foo.jpg|{{small}}]] and the | is very much intended to be interpreted *not* literally (ie, option (a)).

I note the image syntax may not be following the same rules. :/

I think it is still worth thinking through those types of use cases to ensure that the template author has full control of the interpretation of magic characters like | and = in the contents of the argument. I agree that "fully escaped literal text" is a sound default though!

Let's not. Making it so that {{foo|bar{{=}}baz}} and various other methods of doing the same sort of thing might be interpreted as parameter 1 being "bar=baz" or parameter bar being "baz" based on how Template:Foo is coded seems like a surefire recipe for confusion.

I think the heredoc syntax as disabling the usual interpretation of | and = to the parsing of the wikitext string is probably as far as it should go. I don't think preventing expansions of wikitext is likely to be generally useful, and we already have <nowiki> for that when it's needed.

There should be no way for the template/parser function/Scribunto module to know whether the wikitext calling it used heredoc syntax or not.[1]

Access to the unexpanded wikitext of an argument, if we want to provide that, should be a separate feature completely unrelated to whether heredoc syntax was used. At the PHP level that would likely be a new method on PPFrame or a parameter to PPFrame::getArgument(); whether and how to expose it in Scribunto or wikitext could be figured out elsewhere.

[1]: Beyond guessing based on the presence of | in the value. Although that could have come from a Scribunto frame::expandTemplate() call.

@Anomie I generally agree, but:

I note the image syntax may not be following the same rules. :/

My eventual goal is to allow the use of something like {{#media|Foo.jpg|...}} if you need heredoc quoting. Given that, it's worth trying to figure out whether or not it would be possible to write a template like {{small}} using that syntax. Of course I'm persuadable if you think the answer is that you should write it as {{small|Foo.jpg}} and that should expand to {{#media|{{{1}}}|thumb|200px}} avoiding the whole issue of trying to expand an argument to more than one option.

I'll note that other template systems solve this problem with some sort of "varargs" type syntax, where you explicitly say that {{{1}}} is a "list of arguments". Something like {{{...1...}}} (just to make up syntax). Ideally we'd have a proper key-value map to back that with, not just a string...

I think the heredoc syntax as disabling the usual interpretation of | and = to the parsing of the wikitext string is probably as far as it should go. I don't think preventing expansions of wikitext is likely to be generally useful, and we already have <nowiki> for that when it's needed.

There should be no way for the template/parser function/Scribunto module to know whether the wikitext calling it used heredoc syntax or not.[1]

Mostly agree, in that it shouldn't be exposed as part of the macro insertion API or something like that. But on the other hand, we've mooted the idea of using the opt-in syntax to also opt-in to "better" template expansion semantics (where different people have different definitions of "better"). I'm not opposed to that in principle. We should use the opportunity we have to allow people to opt-in.

Access to the unexpanded wikitext of an argument, if we want to provide that, should be a separate feature completely unrelated to whether heredoc syntax was used. At the PHP level that would likely be a new method on PPFrame or a parameter to PPFrame::getArgument(); whether and how to expose it in Scribunto or wikitext could be figured out elsewhere.

See T203293: {{Row numbers}} completely fails on the Android app / VisualEditor. We very much want to avoid hacks like that. If we can do that by switching to lazy evaluation of arguments (see the above point about opt-in changes) than I think we should do so. I really don't want to expose the strip state!

But on the other hand, we've mooted the idea of using the opt-in syntax to also opt-in to "better" template expansion semantics (where different people have different definitions of "better"). I'm not opposed to that in principle. We should use the opportunity we have to allow people to opt-in.

Let's not confuse a new syntax for invoking templates with opting in to some new way of writing templates. Let's not even confuse a new way of passing an argument to a template with opting in to some unrelated feature of template expansion.

See T203293: {{Row numbers}} completely fails on the Android app / VisualEditor. We very much want to avoid hacks like that. If we can do that by switching to lazy evaluation of arguments (see the above point about opt-in changes) than I think we should do so. I really don't want to expose the strip state!

We already have lazy evaluation of arguments, I've said that at least three times now. What we don't have is a way for anything (wikitext, parser functions, or Scribunto) to get the value without evaluating it, and that's what T203293 wants.

Ok, we're (almost) totally agreed then. :)

(I think our only difference is that I would *like* to leverage heredoc syntax to opt you in to balanced expansion of the template at the use-site -- but that's not going to happen for path-dependence reasons, so that makes us totally agreed I think.)

cscott added a comment.EditedNov 2 2018, 8:31 PM

Copied from a discussion at https://gerrit.wikimedia.org/r/#/c/mediawiki/services/parsoid/+/467531/6/lib/wt2html/tt/LinkHandler.js@1014 wrt how [[File:Foo.jpg|{{sometemplate}}]] gets parsed:

In PHP-land, the preprocessor does the [[...]] matching and the |-splitting before any other tokenization gets done... but then we do template expansion and the parser *re-does* the [[...]] matching and |-splitting afterward, allowing a second chance for templates to generate brackets (always bad) and vertical bars (a feature, I guess). (See T172306: Broken wikilinks can be parsed as wikilinks after preprocessing)

Parsoid is sort of doing the same: iff templates are involved, we expand them and then do a hacky reparse to give a second chance for the template to generate vertical bars.

Both of these have weird corner cases, because we're treading the fine line between literal text and wikitext tokens.

If you use the heredoc syntax for files:

{{#file:Foo.jpg|alt=<<<http://test.com|123>>>}}

You at least have an unambiguous way to state that you don't want vertical bars treated as tokens. (Of course <nowiki> would work as well.) But that doesn't generalize if the parameter is coming from a template or a template argument; what you want is something like:

{{#file:Foo.jpg|alt=<<<{{{1}}}>>>}}

but that pushes the responsibility onto the parser function itself to do brace expansion on its argument...or not. (Compare to {{#file:Foo.jpg|alt=<nowiki>{{{1}}}</nowiki>}} which of course just gives you literal text.)

And then if you *did* want to return multiple named arguments, you'd want something like splat syntax:

{{#file:Foo.jpg|{{{...1...}}}}}

and then the template should return a proper key-value map (hopefully as a structured object, but at least as an unambiguous string, say JSON).

All of which is a really long way to say that it sucks that both PHP and Parsoid have some really unintuitive behaviors here caused by trying to reparse the string output of templates, and they both break in unexpected ways when you push on them too hard.

Heredoc arguments give you a way to pass "exactly this text" to a template/parser function, but the job isn't quite done. We might need something like the "quoted heredoc" as well which allows you to expand {{{1}}} or {{random-template}} inside it and be sure that pipe characters in the result will be escaped and won't be interpreted as argument separators.

Elitre removed a subscriber: Elitre.Nov 8 2018, 4:56 PM

See some further discussion of argument lists and splat syntax in https://phabricator.wikimedia.org/T196440#5341715

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptJul 17 2019, 5:04 PM

@cscott this RFC seems close to an agreement. Implementation is probably blocked on php-parsoid, but the new syntax and semantics could already be approved. Do you want to move this forward? TechCom could put it on last call, or schedule an IRC meeting, if one is still needed.

ssastry added a comment.EditedJul 22 2019, 4:57 PM

@cscott this RFC seems close to an agreement. Implementation is probably blocked on php-parsoid, but the new syntax and semantics could already be approved.

Unless we want to do this in both Parsoid/PHP and the existing core parser, this is probably blocked on Parsoid/PHP becoming the default. But, maybe most of the work is in the preprocessor and some in Parsoid's peg tokenizer in which case it is still doable without waiting on that integration.

@Arlolra has some WIP patches as well so maybe he has some insight into this question.

@Arlolra has some WIP patches as well so maybe he has some insight into this question.

The question is whether it makes sense to move forward with approving the syntax and semantics, regardless of the state of implementation. And, yes, I'm in favour of that.

Thinking only of the short-term, for the moment, any consensus we could reach before Wikimania could be communicated in the various "what's next for wikitext" sort of talks we're on tap to give and/or if don't quite have consensus yet we could use Wikimania to help build it.

My last hangup on this proposal was my unease about *always* protecting | and =, since there are templates which want to splice argument lists. But the conversation around T196440 helped convince me that arg list processing is orthogonal (and likely has to do with the "result type" of the template), so I'm pretty confident that's not a blocker for heredoc arguments any longer.

I agree with @Arlolra that we can finalize the proposal syntax and semantics, even if we haven't figured out where implementation belongs on the Parsing-Team roadmap yet.

daniel moved this task from Under discussion to P1: Define on the TechCom-RFC board.EditedJul 22 2019, 6:58 PM

Dropping this into the rfc inbox for techcom review. Do I read this correctly that everyone here thinks this can go on last call?

daniel moved this task from P1: Define to P5: Last Call on the TechCom-RFC board.Jul 26 2019, 11:08 AM

Per the TechCom meeting on July 24, this RFC goes on Last Call for being approved. If no objections remain unaddressed by August 7, the RFC will be approved as proposed and amended.

If you care about this RFC, please comment - in support, or raising concerns. The Last Call period is not just for raising objections, but also for confirming consensus.

This RFC has been approved as proposed per the TechCom meeting on 2019-08-07.

It is noted that implementation will likely have to wait for Parsoid-PHP to land.

Dalba added a subscriber: Dalba.
cscott updated the task description. (Show Details)Aug 17 2019, 10:05 PM
Krinkle added a subscriber: Krinkle.Nov 1 2019, 3:45 PM
cscott added a comment.Jan 8 2020, 5:38 PM

Some additional comments:

  • For nesting I'd specified only numeric prefix tags originally, and that should probably be the first implementation. It probably wouldn't actually be harmful to allow unicode \w as well, though; might be more readable to human editors. But that might open pandora's box with TemplateData and folks would want to specify the exact tag that was used by default, etc.
  • The proposal left unspecified what happens if the matching close-tag isn't found. Two obvious choices: (1) treat the end of the document as an implicit close, and (2) backtrack and treat the open tag as literal text.

(1) is consistent with many existing preprocessor constructs and tag-open behavior.
(2) is probably more editor-friendly; it prevents "breaking the whole page" when you forget a close tag. The backtracking can be computationally expensive, though; a naive implementation will take O(N^2) time to parse 1<<< 2<<< 3<<< 4<<< 5<<< 6<<< ....

We'd probably want to take the time to do a careful (ie, O(N)) implementation of (2).

cscott added a comment.EditedJan 10 2020, 6:13 PM

@Anomie brings up some interesting corner cases over in T230683: New syntax for multiline list items / talk page comments for extensions of this syntax:

Links, like [[Foo|<<< Some text with ]] in it >>>]]? Or [[What|<<<would [[this]] do?>>>]]?

Those seem fine. For the second case, you can already embed <a> inside <a> with various wikitext tricks today (as well as in literal HTML, obviously); they get cleaned up by Remex the way the HTML standard said they should (I forget exactly what that is, I *think* the inner link closes the outer one and then re-opens it when it is done, but I could be misremembering.)

<ref><<<Can we do subrefs like <ref>this</ref> now?>>></ref>?

T204370 proposes:

{{#tag:ref|<<<We can do subrefs like <ref>this</ref> now!>>>}}

I think that more-or-less "just works" once heredoc arguments are in place.

But if you'll indulge me, let's look at <ref><<<Can we do subrefs like <ref>this</ref> now?>>></ref>. This is an interesting case to be sure.

The "desirable" property of extension tag processing in wikitext is that we completely ignore anything after the open tag, looking only for the literal text matching the close tag. This makes it a robust escape mechanism: there's only one "special sequence" in extension content, and it's at least four characters long. However that one special sequence is fixed. If you want to embed content which happens to match the one special sequence your extension cannot contain you are out of luck. (I suppose you could hack around things by abusing localization to try one of the different localized versions of the tag instead... wonder why no one's done that?)

Wikitext 2.0 proposed <ref#someid>we can do embedded refs <ref#otherid>this way</ref#otherid></ref#someid> which would allow optional hashes on extension tags which are completely ignored other than for matching. That would allow extension tag processing to "completely ignore the contents, just look for the matching close tag" as it does today, while letting extensions actually contain any content, not just "any content except the fixed-close-tag sequence". (If your content matches the close-tag sequence, pick another hash. You're guaranteed to be able to pick one that doesn't match anything in your content.)

The prefix-tag mechanism of heredocs is a similar loophole for exactly the same purpose. Compare this to a balance/nesting approach, where you have to instead provide a different escape mechanism so you can "undo" any unmatched open-nest constructs embedded in the content... and then of course the escape mechanism has to be able to escape itself, in case the desired content contains the escape mechanism, etc. (Eg, string quoting uses balanced " characters, but then you need to add \ so that you can include a literal ", and then you also need to define \\ so you can add a literal \, etc.) Usually these balanced characters are fixed (although PHP regex termination is interesting) and so things get complicated quickly if you want to embed/escape text which already uses the same escape mechanism you need to use.

The heredoc proposal above is actually a hybrid, where you do balanced matching *as well* as allowing a tag prefix. For "clean" content balanced matching looks good and works well. But the text you wanted to escape contained an unbalanced set of <<< ... >>>, then you'd switch to using a unique tag prefix that doesn't appear anywhere in the content. That avoids the need to come up with a different escape mechanism to handle unbalanced content.

(The case where the desired embedded content ends in > is a weird corner case. If you can't add a space to break the token, you have to use a prefix, eg escape --> as: arg=1<<<-->>>>1. But it all still works.)

cscott updated the task description. (Show Details)Jan 10 2020, 6:24 PM
Krinkle removed a subscriber: Krinkle.Jan 10 2020, 8:11 PM

@Anomie brings up some interesting corner cases over in T230683: New syntax for multiline list items / talk page comments for extensions of this syntax:
...

Let us not complicate matters unnecessarily. If there were syntactical limitations that prevented certain things from being expressed, and this new usage allows it, then, that is good (ex: multi-line constructs in list items, etc.). If there were semantic reasons that prevented certain things from being expressed, they won't be expressible even if the syntax allows it (ex: links-in-links).

The question however is: what kind of enforcement of bad constructs are available? Broken rendering OR failed parsing leading to rendering of broken constructs as literal text? Those are details that can be worked out, but in either case, editors will realize the GIGO principle and we can always use linting to flag brokenness. Overall, I think we can skip such ultra-detailed implementation notes and leave them to gerrit patches and code review. Let us keep the focus on the important higher-level issues that need clarity.

Aklapper removed Arlolra as the assignee of this task.Fri, Jun 19, 4:29 PM
Aklapper removed subscribers: RobLa-WMF, Spage.

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Arlolra claimed this task.Fri, Jun 19, 5:28 PM