Page MenuHomePhabricator

New syntax for block list items
Open, Needs TriagePublic

Description

This was forked from discsussion in T230658: Syntax for list item attributes, T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments), and discussion at Wikimania 2018. The phab task is being created belatedly, but I wanted to have a task ID to associate with this idea.

The proposal made at Wikimania 2019 was to use syntax similar to that proposed for "long arguments" (T114432) for block list items:

:: <<< This is a list item.

This is a new paragraph, but still part of the same list item. >>>
:: This is a new list item.

In T230658: Syntax for list item attributes various alternatives for list item attributes are proposed, but one meshes well with this "long argument" syntax:

:: id=my-comment class=important <<< This is my comment! >>>

which also works for "block headings":

== id=my-heading <<< This is my heading >>> ==

I'm not merging this with T230658: Syntax for list item attributes though, since we could decide to go a different way with attributes, and although pleasing from an orthogonality perspective, there is no pressing need for block headings. This task is just for block list item syntax, either the above proposal or an alternative.

Event Timeline

Speaking from a purely DiscussionTools perspective, we're interested in making sure that we wind up with a syntax that's "pleasant looking" to power-editors who're still responding to discussions via full-page source editing.

As such, I'm particularly interested in tweaks to the format that'd let us preserve apparent indentation within the list item's wikitext, while not interfering with multiline content that people want to include...

e.g.

:: <<< This is a list item.
::
:: This is a new paragraph, but still part of the same list item.
::
:: {{Multiline template|
:: arg1=This is also in
:: |arg2=the same list item}}
:: >>>
:: This is a new list item.

The problem with that specific suggestion, of course, being that it'd then be challenging to nest another list within the extended list-item, particularly if the indentation-preserving :: is optional.

I don't particularly like that. However, there are heredoc implementations that have a somewhat similar feature, for example:

The closing identifier may be indented by space or tab, in which case the indentation will be stripped from all lines in the doc string. Prior to PHP 7.3.0, the closing identifier must begin in the first column of the line.

https://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.heredoc

You could perhaps generalize this to: /any/ characters preceding the final >>> on the list /and present after each newline/ will be stripped /if there is at least one newline/. The final clause is to handle the single-line case : <<< foo >>> where you don't want to do any stripping. This handles your example, although I personally wouldn't write my long argument that way, I don't think it is more readable. The more conventional whitespace stripping (like PHP does) would allow:

:: <<< This is a list item.
  
   This is a new paragraph, but still part of the same list item.
  
   {{Multiline template|
   arg1=This is also in
   |arg2=the same list item}}
   >>>
:: This is a new list item.

and that still allows you to indent while preserving readability. Both of these should allow further nesting:

: <<<
: Example1
: : <<< This is an embedded list
: : This should still work
: : >>>
: But it makes my eyes bleed.
: >>>

Indent-pre is the only reason you can't just have extra whitespace on each line w/o any special stripping. There are other contexts in which we suppress indent-pre, and this might be worth considering as another one.

That suggestion sounds good to me.

I think we might always need to strip a final newline in the argument, or else there's are strings we can't encode: For example, if we actually want to encode \n: as the argument, we can't use:

: <<<
: >>>

because everything before the final >>> is stripped. We'd have to encode this as:

: <<<
: 
>>>

and rely on stripping a final newline there. If you *actually* wanted \n: \n as the argument you need to double up the final newline:

: <<<
:

>>>

This annoys me a little bit, in terms of complicating editor understanding of how <<< / >>> quoting works and adding some weird corner cases, but shouldn't complicate machine-generated quotations: you just add "\n>>>" at the end of the desired list contents and then str_replace("\n", "\n$PREFIX" , ...), with a special case that if the desired list contents doesn't contain \n you can just add >>> at the end and be done (single-line form); although you'd probably use the single-line : form in that case not the block : form.

This is a duplicate of T230683: New syntax for multiline list items / talk page comments where you have had quite some feedback about this markup in the general (and at least specific to talk page comments). Why is there a new task?

It feels worth pointing out that the proposed "heredoc" syntax, despite being saddled with that name, diverges from traditional heredoc implementations (UNIX shell, PHP, etc.) in one significant way: The terminating marker.

In "real" heredocs, the terminating string isn't fixed, it's specified immediately after the opening marker. The heredoc then only terminates when the same string is encountered again, signifying the end of the enclosed range. So with "traditional" heredocs, instead of this:

: <<<
: Example1
: : <<< This is an embedded list
: : This should still work
: : >>>
: But it makes my eyes bleed.
: >>>

You might have something more like this:

: <<<__END__
: Example1
: : <<< LISTEND
: : This is an embedded list
: : This should still work
: : LISTEND
: But it makes my eyes bleed.
: __END__

The __END__ and LISTEND strings can be literally anything, as long as it's repeated to terminate the heredoc.

The reason for that is simple: So that the heredoc can safely contain ANY text, without restriction, the sole exception being the specific string you choose for the terminator. There's no way to write a heredoc containing its own terminating marker, but because that marker isn't fixed there's also never any need to. If a heredoc needs to contain the terminating string, just change the terminating string.

That level of flexibility makes the syntax easier to use — and to parse, as it turns out, since you don't have to deal with issues of escaping terminating marks or tracking nesting. ("A heredoc in a heredoc" isn't really a sensible construct in traditional uses, since a heredoc is effectively a multi-line quoted string; it doesn't contain arbitrary code.)

It's the same reason most s///-style regular expression systems allow you to replace the forward slashes with any other character, in case you need to use them inside the expression. Because this:

s/\/home\/user\//\/tmp\//

is a lot uglier than this:

s|/home/user/|/tmp/|

Despite saying exactly the same thing to sed, or vi, or etc.

But in terms of WikiText, if "heredocs" are to be used exclusively for list items, and indenting marks are to precede each line of the contents, then it seems like the terminating mark could actually be optional. We already have an indicator of when the heredoc ends: It ends when the list item ends. IOW, this shouldn't be any more ambiguous to parse correctly than the original example:

: <<<
: Example1
: : <<< This is an embedded list
: : This should still work
: But it makes my eyes bleed.

The only problem will be if someone comes along and adds another :-indented line below:

: <<<
: Example1
: : <<< This is an embedded list
: : This should still work
: But it makes my eyes bleed.
: I'm not a part of this!

So, for that reason, it might be prudent to include an optional terminating marker for the outer heredoc:

: <<<
: Example1
: : <<< This is an embedded list
: : This should still work
: But it makes my eyes bleed.
: >>> 
: I'm not a part of this!

But the termination of the inner heredoc is still unambiguously implicit in the end of that level of indentation.

I don't see where something like this could ever be valid, so the need to require an explicit terminator doesn't seem that great:

: <<<
: Example1
: <<< This is an embedded heredoc
: Within the first one.
: >>>
: >>>

If heredocs are supposed to be found only at the start of list items, then the second : <<< is invalid because a heredoc is already open for that item. Or, if we don't want to throw parse errors at people, then opening the second heredoc should implicitly terminate the first one, making this:

: <<<
: Example1
: <<< This is no longer an embedded heredoc
: Within the first one.

Actually contain two separate list items, equivalent to this:

: <<<
: Example1
: >>>
: <<< This is no longer an embedded heredoc
: Within the first one.
: >>>

One of the most important parts of the new syntax is that on continuation lines, you shouldn’t have to write any prefix, so that in case the continuation line turns out not to be a real new line (but, rather, a line break within a template transclusion, extension tag etc.), the stray colons don’t cause issues. And with that, the end of the string isn’t obvious anymore, so we do need an end marker.

(By the way, even if it was obvious, I’d recommend against introducing optional parts – nowadays, with Parsoid, an important goal is that wikitext roundtrips. If there are two ways to generate the same HTML from wikitext, the HTML-to-wikitext conversion will either not know which one to use, or wikitext-to-HTML will need to insert extra information in the HTML that HTML-to-wikitext can later use.)