Page MenuHomePhabricator

Syntax for list item attributes
Open, Needs TriagePublic

Description

This is a placeholder task, because I'm not sure exactly what syntax is best here.

Currently in order to add class/id/etc attributes on section headings or list items (common elements in talk pages) we need to switch from wikitext to HTML syntax. That is, <h2 class="foo">....</h2> instead of == .... == and <dl><dd class="foo">....</dd></dl> instead of : ....

That's unfortunate: this syntax looks ugly, which means it is hard to use attributes to record additional information about comments, for example comment outdents (class="outdent" or data-talk="outdent") or human-readable ids on individual comments (id="foo"). See T230659: Automatically-assigned id attributes for list items for more information on how these list item attributes could be useful.

Tables, table rows, table cells, and table captions already have wikitext syntax for attribute, which may or may not be a good model here.

Additionally, talk page list items are expected to use heredoc syntax when the contents get 'complicated', so we might imagine tying attribute syntax to heredoc syntax in order to minimize backward compatibility concerns.

Existing table syntax:

{| class="wikitable"
|+ class="bar" | caption
|- style="foo"
! class="bar" | cell
|- style="foo"
| style="foo" | cell
|}

Some options for lists and headings (for discussion only; I'm not actually endorsing any of these at this point):

:::<attr id=foo class=bar/> xyz ("magic extension")
:::{{#attr|id=foo|class=bar}} ("magic parser function")
:::|id=foo|class=bar| ("like table syntax")
::: id=foo class=bar | ("more like table syntax")
:::[id=foo][class=bar]  ("like CSS syntax")
::: id=foo class=bar <<< xyz >>> ("requires use of heredoc syntax")
::<dd id=foo class=bar> ("explicit tag, but don't require dl wrapper, etc")

===<attr id=foo> foo ===
=== id=foo class=bar <<< heading >>> ===

Event Timeline

cscott created this task.Aug 17 2019, 2:07 PM
Restricted Application added subscribers: Liuxinyu970226, Aklapper. · View Herald TranscriptAug 17 2019, 2:07 PM
cscott updated the task description. (Show Details)Aug 17 2019, 2:17 PM
Anomie added a subscriber: Anomie.Aug 19 2019, 5:33 PM

Tables, table rows, table cells, and table captions already have wikitext syntax for attribute, which may or may not be a good model here.

Probably not a very good model. I can't recall anyone ever actually liking wikitext table syntax beyond that it saves a few keystrokes for simple tables.

:::<attr id=foo class=bar/> xyz ("magic extension")
:::{{#attr|id=foo|class=bar}} ("magic parser function")
:::|id=foo|class=bar| ("like table syntax")
:::[id=foo][class=bar]  ("like CSS syntax")
::: id=foo class=bar <<< xyz >>> ("requires use of heredoc syntax")
===<attr id=foo> foo ===
=== id=foo class=bar <<< heading >>> ===

IMO all of these are pretty awful, in that they're complicated syntax subject to user confusion and accidental breaking.

Another idea would be almost like the first:

:::<dd id=foo class=bar> xyz

Difference is that it's not a self-closing tag and the tag matches the list element.

That actually more or less works already since Remex does almost the right thing to the invalid HTML that Parser.php outputs for that wikitext (namely <dd><dd id=foo class=bar> xyz</dd><dd></dd><dd id=foo class=bar> xyz</dd>), and for <li>-based lists we somehow wind up with <li class="mw-empty-elt"></li> which does a right-ish-looking thing too.

Still rather confusing, but a bit more logical if you're already used to the HTML equivalence for the same reason it more or less works with Remex.

cscott updated the task description. (Show Details)Aug 22 2019, 5:59 PM

That's a reasonable alternative; I've added it to the list in the task description. There are some weird corner cases w/r/t properly closing the list; I think we want some sort of multiline list syntax anyway (T230683: New syntax for multiline list items / talk page comments), so it might make sense to tie the attribute syntax to that. But there are multiple proposals for multiline lists, too.

Anyway, early days. Interested to hear continuing thoughts. I personally am starting to favor the heredoc syntax both for this and for T230683 because it seems to unify the proposals, but it's fair to say we're nowhere near consensus yet.

Izno added a subscriber: Izno.EditedOct 5 2019, 7:41 PM

Syntax for attributes is something I made a task for at T202083: First-class wikitext support for ordered list item value which I closed duplicate of the problem I was looking to solve in that context. General gist I suggested was similar to the table syntax, namely: # id=A class=X | BCD where BCD is the list item content. I can see users going for that syntax. I don't really understand the proposed "table" syntax above (:::|id=foo|class=bar| ("like table syntax")) since that's not how table syntax works. "like", but definitely not.

stjn added a subscriber: stjn.Oct 10 2019, 4:00 PM

I can’t comment on syntax itself, but we really shouldn’t use (or add the ability to use) definition lists for this (: syntax). It makes extremely broken and unaccessible HTML, which is something to avoid in a tool written by WMF. See explanation here:
https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Accessibility#Indentation

I agree, but I think changing the DOM tag is out of scope for this this ticket, as doing so would probably break thousands of on wiki style rules and gadgets.

stjn added a comment.Oct 16 2019, 4:33 PM

Wouldn’t syntax like this be used only by newer talk pages anyway? So if they will use the bad tags, it is concerning. Maybe I should’ve written this in T230659, though.

Jc86035 added a subscriber: Jc86035.EditedNov 5 2019, 6:56 PM

How is the syntax to get into the page source? Is it going to be saved directly or is something like an extension tag going to generate the HTML?

I imagine it will be almost certain that some users (particularly experienced users) will continue to reply to comments by editing the whole page or section; I assumed throughout phase 2 that this choice would be left open, and this was the way that the phase 2 proposed direction was presented. The way that the options in the task description have been presented (I'm assuming that the attributes are going to be saved into the page source), because the software would have to add something like id=foo class=bar on page save for every new comment, the start and end of the comment would need to be automatically detected by the software in order for a "real" comment to be generated. The syntax could also pose problems e.g. with actual wikitext lists being used at the start of a comment.

I suggested during the community consultation that signatures could be modified to insert extension tags within or after the signature HTML. This would essentially create delimiters between comments, which would allow the software to distinguish different comments without matching timestamps. The id=foo etc. would be inside the extension tag, so the workflow would not change in this regard for users adding new comments through the section/page editing interfaces, and presumably the software could build all the styling and extra interface buttons around the delimited comments (e.g. replacing certain list item syntax while retaining list item syntax that isn't being used to delimit comments). It would still be possible to create dummy signatures without the extension tag by signing ~~~ ~~~~~.

I don't know how the underlying code works, so it wouldn't be up to me to determine what approach is the most feasible, but none of the presented options seem appealing to me just from a feasibility POV; users would presumably be quite annoyed if they'd have to substitute in a revision ID for every comment written in the 2010 editor.

Perhaps both a legacy syntax and a new syntax could be supported at the same time (if the new syntax is going to be sufficiently different to the original syntax), but this would be another can of worms and it's probably not worth discussing it further unless it's going to be seriously considered.

Jc86035 added a comment.EditedNov 5 2019, 7:11 PM

More or less, an experienced user is going to want to be able to save this sort of comment through any of the available editing interfaces, regardless of the level that the comment is nested at, using the syntax that they learned years ago (or something very close to that).

* Text<ref>text</ref>
*; Text : text<ref>text</ref>
*; Text {{Smiley}}{{#invoke:Bananas|hello}}
*;: text<ref>test</ref>
[[File:Example.svg|20px]]
Lorem ipsum;

lorem ipsum;

lorem ipsum.

{| class="wikitable"
! A !! B
|-
| A || B
|}

# This
# That
# The other<ref>text</ref>
Thus, lorem ipsum. ~~~~
{{reflist-talk}}

Note that {{reflist-talk}} here is placed after the signature. I have done this before, several times; there isn't really a convention for doing so, but ideally this entire comment would be indented correctly without any issues and would not display badly. Of course, it's very possible that it might be infeasible to get the software to this point, but there are a lot of things that could be broken by a syntax change.

(Ideally, experienced users will also need to be able to get their comment indentation wrong and not have to fix it, because a lot of experienced users do get indentation wrong, especially if the discussion is one where the first-level nesting is done using bullet points; this is very common in e.g. RFCs, but editors tend to habitually do it in arbitrary discussions if stating their own opinions in succession. Right now, this is fine because it doesn't result in major visual hiccups, and it could plausibly be worked around by e.g. mandating semicolons for indentation for new comments, but I imagine a lot of older discussions may display very incorrectly if this is not taken into account. Additionally, community processes that use numbered lists, such as RFAs and other confirmation votes, will need to be taken into account in some way.)

Jc86035 added a comment.EditedNov 5 2019, 7:42 PM

The syntax that I suggested back in February (during the phase 1 consultation) was:

>4*
[arbitrary wikitext] ~~~~

The number(s) would indicate the indentation level, and the list item after the number(s) would indicate some sort of comment styling (as opposed to being directly analogous to the current list item markers). Use of the new syntax would perhaps be optional (i.e. >4* could be omitted in place of **** or :::*). The metadata would be provided entirely by attributes within an extension tag in the signature (as proposed in T230653).

Note that there is an (optional) newline after the >4*. The obvious technical fault with most/all of the proposed syntax suggested in the task description is that it would be impossible to start comments with wikitext list items (or at least that would have to be changed), as those are only recognized as list items if preceded by a newline and optionally other list item marker characters.

From a usability perspective, this sort of syntax style is what I would prefer if I were to force myself to use the existing source editor to write comments.

cscott updated the task description. (Show Details)Wed, Jan 8, 5:47 PM
cscott added a comment.Wed, Jan 8, 5:49 PM

Syntax for attributes is something I made a task for at T202083: First-class wikitext support for ordered list item value which I closed duplicate of the problem I was looking to solve in that context. General gist I suggested was similar to the table syntax, namely: # id=A class=X | BCD where BCD is the list item content. I can see users going for that syntax. I don't really understand the proposed "table" syntax above (:::|id=foo|class=bar| ("like table syntax")) since that's not how table syntax works. "like", but definitely not.

Thanks, I've added this as another option. The way table syntax works is a little problematic as we've got potentially unlimited lookahead to find a vertical bar |, which if found retroactively changes how everything before the vertical bar is parsed. I'd added a leading vertical bar in order to avoid the need for the lookahead and make it less likely an vertical bar late in the item content triggers this by accident. But both variants are worth considering.

cscott added a comment.EditedWed, Jan 8, 5:56 PM

To narrow down the options some, I'd like to put forward

::: {{#attr|id=bar|class=x}}

as a general "attach these attributes to the containing HTML tag" mechanism. It would work in templates, so you could localize it/come up with useful shortcuts. (EDIT: You could use the normal magic word localization mechanism; you don't need to embed it in templates.)

I'd like this even better if we could use a meaningful symbol instead of attr so the function name didn't contain english text. But that ends up looking like line noise: {{#@|id=...}}.

But as a benefit, this "magic parser function" would work in all sorts of different contexts, and wouldn't require context-specific syntax. You could even use it to add attributes to table cells, for example, as a uniform alternative to the ad-hoc existing syntax.

Issues:

  1. We'd need a policy for dealing with conflicts, where multiple {{#attr}} invocations or existing syntax propose different values for the same attribute. I suggest a general policy of "the value specified last in the wikitext" (ie, associated with the largest character index in the source) wins.
  1. There is a complicated interaction with p-wrapping. If your list item contained a double-newline, then the {{#attr}} would be p-wrapped and the "containing html element" would change, for example with the attributes being applied to the specific <p> tag and not the list item. I suggest that {{#attr}} always ignore <p> nodes; use a <div> wrapper if you want to annotate a specific paragraph. (We might even consider ignoring a broader set of tags, like <span>/ <b>/ <i> etc, to avoid the case where an unclosed tag/bold/italic affected an {{#attr}} placed at the end of a block; you can always use html literal tags if you want to add attributes to <span>/<b>/<i>/<p>.)
Anomie added a comment.Wed, Jan 8, 6:14 PM

Wouldn’t syntax like this be used only by newer talk pages anyway? So if they will use the bad tags, it is concerning. Maybe I should’ve written this in T230659, though.

I've been pushing on some of the related tasks for introducing a new syntax specifically for comments, that can have all the features comments need (e.g. T230659). Then for list markup, including definition-lists via :, we can consider only what actual lists need without bringing in what is only needed for talk page comments (mis)using list markup. I don't know if that changes your concerns here.

I think most of the discussion of new talk page comment markup is on T230683, which unfortunately is titled to conflate it with list markup.

To narrow down the options some, I'd like to put forward

::: {{#attr|id=bar|class=x}}

as a general "attach these attributes to the containing HTML tag" mechanism. It would work in templates, so you could localize it/come up with useful shortcuts.

I note that a few years back we were encouraged to avoid non-locality like that because it made something-or-other with Parsoid unhappy. That's why T67258: Information can be passed between #invoke's (tracking) exists. Has that policy changed?

cscott added a comment.Wed, Jan 8, 8:36 PM

TL;DR: you're right, I should probably have proposed "{{#attr}} doesn't escape templates to affect content in the surrounding page" as the basecase, although you could support that (keep reading...)

I did consider whether it was possible to do incremental parsing even if {{#attr}} escapes templates. It complicates things, but because the desired behavior is tightly bound to the DOM tree, I think it's still fine. Briefly, we'd like to consider template expansion as a DOM subtree insertion; nonlocal effects outside of that subtree (like language converter rules, which apply "to the rest of the page") are discouraged. Global effects (like "change the displaytitle" or "add a category") can be accommodated without too much trouble. This proposed {{#attr}} behavior would be something in-between: the effects are still tightly scoped to a DOM subtree, it's just the parent node of the inserted subtree which can be affected. So you'd need to keep around a little more metadata to merge attributes properly in that parent node, but nothing show-stopping. The buck stops there.

Consider:

<div id=parent class=foo> {{1x|{{#attrib|class=bar}}}} {{1x|{{#attrib|class=baz}}}} {{#attrib|class=bat}}</div>

If the definition of [[Template:1x]] were to change, we'd re-render and, in addition to inserting the new rendered contents, we'd also have to do an attribute merge for the div#parent. The inputs would be the class=foo coming from the wikitext tag, whatever {{#attrib}} resulted from the new Template:1x expansions, and the class=bat coming from the final {{#attrib}}. We're merge and resolve based on source order, so the end result would be we'd set class=bat in div#parent. (We'll assume these templates are all balanced so Template:1x can't evaluate to </div><div> for example.)

Presumably {{#attrib}} would be implemented as a post-processing pass over the DOM, similar to several other postprocessing steps on the DOM, like ref numbering, header id unique-ification, etc. You'd store the attributes keyed by source position as you come across them, and then order, resolve, and apply them at the end.

It certainly makes things easier if you don't allow {{#attrib}} to escape a template. That would mean that something like [[Template:Yes]] would have to generate an entire <td> node since the any {{#attrib}} couldn't escape to affect the containing <td>`... but that's probably a good thing. (And there's no way an {{#attrib}} can affect a node generated by a later template -- the sort of between-template communication frowned upon in T67258.)

Note that these proposals add attributes to list *items* but not to the parent <ol>/<dl>/<ul> node. As discussed in T11996#5793656 there are various ways you can workaround this if you need to target the list container. Fundamentally the list container doesn't exist as wikitext syntax.