Page MenuHomePhabricator

Automatically-assigned id attributes for list items
Open, Needs TriagePublic

Description

For talk pages, it would be useful to have an id attribute on each list item (talk page comments using : are <dd> list items) so that individual comments can be referenced. Perhaps other lists would benefit from being able to be referenced by default (as headings are).

Strawman proposal is to automatically assign hierarchical ids (so that adding new subitems---which are replies, in a talk page context---does not shift existing ids), something like:

wikitext:

== Thread 1 ==
: This is comment 1
:: This is a reply to comment 1
: This is comment 2

which turns into the following output (note that this is also valid wikitext for this list, but the point of this task is that authors are *not required* to write this; instead the IDs are automatically generated):

<h2 id="thread_1">Thread 1</h2>
<dl><dd id="thread_1-1">This is comment 1
  <dl><dd id="thread_1-1-1">This is a reply to comment 1</dd></dl>
</dd><dd id="thread_1-2">This is comment 2
</dd></dl>

Alternatively/in addition, a "permanent" or "more readable" id could be manually assigned, which would override (or supplement?) the automatically-assigned id. This would get expressed either using explicit <dd id="..."> syntax (sigh), with new cleaner syntax (for example, as proposed in T230658: Syntax for list item attributes), or via some other mechanism (see the note at the bottom of this task).

These assigned list IDs are scoped to an individual article or talk page, so they don't need to be globally unique, which should greatly help make then human-friendly.

In fact, instead of hierarchically-assigned IDs, we could generate them based on the list item/comment content, like we do for section ids. We'd want to truncate to a certain length, and we'd probably want a manual override to preserve the original ID if/when the list item/comment is edited.

As a strawman, using the timestamp as a ID prefix helps ensure that a bunch of comments which all start with "approved" don't need a lot of automatic deduplication:

== Thread 1 ==
: This is comment 1 {{#~|UserA|20190818 10:23:45}}
:: This is a reply to comment 1 {{#~|UserB|20190818 10:23:45.1}}
: This is comment 2 {{#~|UserC|20190818 10:23:45.15}}

<h2 id="thread_1">Thread 1</h2>
<dl><dd id="20190818T1213-this_is">This is comment 1
  <dl><dd id="20190818T1214-this_is">This is a reply to comment 1</dd></dl>
</dd><dd id="20190818T1215-this_originally_said">This is comment 2
</dd></dl>

(That last item might serialize to wikitext as: : id=20190818T1215-this_originally_said <<< This is comment 2 >>>.)

However, this can look ugly if it appears in wikitext -- even as a link: [[Talk:Foo#20190818T1214-this-is]]. Readability needs to be traded off against being 'unique by default' (as opposed to requiring a postprocessing pass to make IDs unique, as section headings currently do).

As a point of comparison, phabricator discussions currently include an opaque numeric tag, but one which is substantially shorter than a full human-readable timestamp. Since [[phab:T230659#5431727]] is considered acceptable, that could be considered a reasonable proxy for the acceptable length of an opaque comment identifier. Note that, unlike in some previous comment-linking proposals, we are still preserving the context of the discussion -- in the phab case via the T230659 prefix, although as an opaque number itself that's not great and phab itself will expand that with hovertext in most UX. In the proposed Talk page context the prefix would be the title of the task page: [[Talk:Foo#5431727]] or something like that.

As a special case of lists, talk pages aren't as often edited (we try to leave existing comments alone!), but they are usually archived after a period of time, which could break existing links. We'd need some way to update references to point to the archived page instead. You could either do this with a bot that could just comb through looking for old references, or more "cleverly" with something like:

[[Special:Talk/PageName#comment-id]]

where that special page would redirect to the appropriate archive page for the given comment id.

Note: '''This task is just for automatically-assigning default list item ids.''' If you want to make list item ids persistent across edits to the wikitext, then make a proposal in a separate task for how to do so. T230658: Syntax for list item attributes is one such proposal, where the comment ID would appear as an explicit ID attribute in wikitext. You could also imagine persisting list items IDs by using a separate database table of some sort that would record assigned IDs and perhaps update this table when wikitext is edited. I don't have a good idea for how to do that, but feel free to make a proposal.

Event Timeline

This presupposes that we want to continue using wikitext list markup for talk pages. Ideally we'd do T230683: New syntax for multiline list items / talk page comments instead.

(That last item might serialize to wikitext as: : id=20190818T1215-this_originally_said <<< This is comment 2 >>>.)

That's awful. To the point that I forsee a conspiracy theory about making the wikitext so ugly to try to force everyone to not use wikitext anymore.

where that special page would redirect to the appropriate archive page for the given comment id.

Interesting idea, but I wonder about the performance of it having to check potentially hundreds of archive pages looking for the ID, plus that it probably requires a specific archive naming format.

If you do new syntax, then you still have to figure out how to add attributes (at least outdent information, and ideally an I'd as well) to that new syntax.

Note that the timestamps in the IDs were a partial addition that Ed and I were working on and they were left in a weird in-between state. I think the id is likely to be much more compact, which would make the explicit I'd version a little prettier:

: id=short-string <<< long comment here >>>

And of course I'd expect editors would appreciate the abilities to add class, id, and data attributes to list items outside the talk page context.

If you do new syntax, then you still have to figure out how to add attributes (at least outdent information, and ideally an I'd as well) to that new syntax.

Ideally outdenting in the wikitext should go away entirely. The wikitest should represent the semantic nesting of the comments and any visual outdenting should be applied by MediaWiki itself.

It would probably also help if the comment-nesting didn't use as much whitespace as the existing colon/asterisk indenting does.

Note that the timestamps in the IDs were a partial addition that Ed and I were working on and they were left in a weird in-between state. I think the id is likely to be much more compact, which would make the explicit I'd version a little prettier:

: id=short-string <<< long comment here >>>

Not enough prettier, IMO. I'm skeptical that use cases for all this (maybe highlighting the targeted item like the reflist does?) are enough to be worth the added complexity to wikitext, versus those few use cases just using HTML-style markup.

And of course I'd expect editors would appreciate the abilities to add class, id, and data attributes to list items outside the talk page context.

In my experience IDs are typically added wherever needed using templates that produce <span id="..."></span>, as the first thing inside the list item if the ID is needed to point to a list item.

As for the rest, the only thing I recall seeing anyone actually ask for is the ability to do <li value="N"> to affect the numbering generated by a # list.

If I understand this correctly, it appears to be a non-starter. One of the clear outcomes of the Talk Page Consultation was not to disrupt normal editing of talk pages, and that any new tools be optional. It appears either impossible or unreasonable for experienced editors to type these IDs, and surely you don't expect a new users to look at the wikipage and figure out how to comment like this.

Edit to clarify: If this were expected for all or most comments I would consider it a non-starter. If this is some rare special purpose feature, equivalent to how on rare occasion we put an anchor link on a section, then I withdraw that concern.

Without getting into the specifics of this task, a point of clarification at a meta level. All these are really preliminary ideas for exploration and @cscott put them out on Phabricator for that purpose. Whether any of these will see the light of day in terms of implementation depends on whether they make sense and fit into the broader picture (for wikitext as a markup language and for the talk pages project). With that clarification in mind, please continue the conversation.

Edit to clarify: If this were expected for all or most comments I would consider it a non-starter. If this is some rare special purpose feature, equivalent to how on rare occasion we put an anchor link on a section, then I withdraw that concern.

This is the case. I edited the task description to attempt to clarify this, since I realized belatedly that the *output HTML* I used in my example could be misinterpreted as a proposal for *input wikitext*. Ambiguous because HTML tags are acceptable wikitext, and in fact are the only current way to express ID attributes on list items. I apologize for the misunderstanding.

The proposal is that list items would get automatically assigned IDs like heading tags do. This task is mostly for discussion (a) whether that's a good idea, (b) what its use cases would be (not just on talk pages), and (c) what format the automatically assigned IDs should have. The "rare special purpose feature" would be where you'd want to override these automatically-assigned IDs, for example if the automatically-assigned ID is computed based on content and you wanted to preserve the original ID after editing the content. But the question of how that "rare special purpose" use would appear in wikitext isn't actually what this task is about. The "status quo" mechanism would be to require explicit HTML tags in the wikitext, either <dd id=...> or : <span id=....>. An alternative mechanism is proposed in T230658: Syntax for list item attributes, but that's largely orthogonal to this task.

(And for the record, Parsoid in fact already assigns unique IDs to every HTML element in the output, but not in a persistent way that would allow users to reliably link to a given element. So it's the "reliable/repeatable" IDs part which is actually novel/worth discussing.)

Anyway, I've attempted to clarify the task description; let me know if this helps.

cscott renamed this task from id attributes for list items to Automatically-assigned id attributes for list items.Aug 22 2019, 6:10 PM

For completeness, another proposal is to automatically scan for a trailing signature (perhaps using the {{#~|user|date}} syntax from T230653) and using the timestamp from this as part of the automatically-generated ID. I'm not a huge fan of this proposal because (a) requires non-local effects on list item markup, and (b) seems to be too talk-page specific, but it's certainly worth mentioning for discussion that automatic ID generation based on content doesn't have to quite as simplistic as the ID generation for headings is.

In T230683#5432585 @Anomie proposes using the revision ID of the edit that creates the comment as its persistent identifier.

In T230683#5432585 @Anomie proposes using the revision ID of the edit that creates the comment as its persistent identifier.

I observed that links to individual talk page comments today often use diffs, which has the advantage of showing exactly what was added, when it was added, and who added it without relying on the (editable) wikitext.

I doubt that using the revision ID would make sense for list items, which is what this task is supposed to be about. It would be convenient for talk page comments, but as I said originally I think we'd do far better by not continuing to conflate the two.

In T230683#5432585 @Anomie proposes using the revision ID of the edit that creates the comment as its persistent identifier.

I'm not sure how this would be evaluated at parse time, unless it was injected into the wikitext with the edit.

In researching the existing gadgets, convenient-discussions generates IDs for each comment that are a concatenation of timestamp and username, which in conjunction with parser-function signatures would be very easy to extract.

As a concrete proposal, I'd like to tie this in with the strawman in T230658#5786980 and propose that the magic parser function {{#~}} (see T230653: Use a parser function to encapsulate signatures) "conceptually" include {{#attr|id=<something>}} in its expansion. (See the discussion there about p-wrapping and how unclosed tags could affect {{#attr}} placed in tail position.)

This would re-use the semantics/implementation of "attach an attribute to the containing node" needed to implement {{#attr}}. Unsigned comments wouldn't have permalinks, but the signature bot could take care of that.