Page MenuHomePhabricator

Decide on HTML format for machine-readable signatures
Open, Needs TriagePublic

Description

We'd like some sort of <span> tagging to separate the various elements in a machine-readable signature. This is the HTML output of the wikitext parser function of T230653: Use a parser function to encapsulate signatures. (T230653 simplifies parsing signature information from wikitext; this task would simplify parsing signature information from rendered HTML.)

I'd suggest trying to use web standards whenever possible. As a starting point, the <time> element around the comment time, with appropriate microdata tagging to indicate that this is a commentTime. There are similar schema.org attributes for other comment-related data --- see for example UserComments and Comment --- although we probably shouldn't get *too* carried away. The semantic anchor for a user can either be their user page URL (ie, https://en.wikipedia.org/wiki/User:cscott) or by their stable user id (ie, https://en.wikipedia.org/wiki/Special:Redirect/user/173490 ).

As a base case, the current HTML emitted by a comment ending in ~~~~ in wikitext (with a sample comment text preceding) is:

<dd>
    The logo was never really non-free, as far as I can tell...
    <a href="/wiki/User:Cananian" class="mw-redirect" title="User:Cananian">C. Scott Ananian</a>
    18:07, 29 March 2007 (UTC)
</dd>

So something like:

<span vocab="http://schema.org" typeof="Comment" class="mw-signature">
    <a href="/wiki/User:cscott" property="creator">C. Scott Ananian</a>
    <time property="dateCreated" datetime="2007-03-29T18:07Z">18:07, 29 March 2007 (UTC)</time>
</span>

We should also consider the issues raised in T120409: RESTBase should honor wiki-wide deletion/suppression of users as well. Parsoid's (former) semantic tagging about users is visible in 89f0eedf78173a0dcf343d5ab223c3da223c9201.

Event Timeline

cscott created this task.Oct 8 2019, 4:52 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 8 2019, 4:52 PM
cscott updated the task description. (Show Details)Oct 8 2019, 5:03 PM
cscott updated the task description. (Show Details)
cscott updated the task description. (Show Details)Oct 8 2019, 5:06 PM
cscott updated the task description. (Show Details)
Anomie added a subscriber: Anomie.Oct 8 2019, 6:16 PM

As a base case, the current HTML emitted by a comment ending in ~~~~ in wikitext (with a sample comment text preceding) is:

<dd>The logo was never really non-free, as far as I can tell, it's been replaced by an identical SVG, which wouldn't change the non-free situation in any case.  But SVG is better than PNG, so go ahead, delete away. <a href="/wiki/User:Cananian" class="mw-redirect" title="User:Cananian">C. Scott Ananian</a> 18:07, 29 March 2007 (UTC)</dd>

The typical signature would also have a link to the talk page, like

<dd>The logo was never really non-free, as far as I can tell, it's been replaced by an identical SVG, which wouldn't change the non-free situation in any case.  But SVG is better than PNG, so go ahead, delete away. <a href="/wiki/User:Cananian" class="mw-redirect" title="User:Cananian">C. Scott Ananian</a> (<a href="/wiki/User_talk:Cananian" title="User talk:Cananian">talk</a>) 18:07, 29 March 2007 (UTC)</dd>

Given how free-form the signature is, I think we should put the creator descriptor around the whole signature part, rather than try to add it to a specific link. There could be multiple links to user/talk/contribs pages, or none (although this would be disallowed by policy on WMF wikis), so something with this structure:

<span class="mw-signature">
    <span creatorAttribute="Cananian">
        <!-- user link --> (<!-- talk page link -->)
    </span>
    <time ...>
        <!-- timestamp -->
    </time>
</span>
Esanders updated the task description. (Show Details)Oct 9 2019, 1:05 PM

So...

<span vocab="http://schema.org" typeof="Comment" class="mw-signature">
    <span rel="creator" resource="/wiki/User:cscott"> <!-- machine readable link to username in span attributes -->
       <!-- But note that everything in this <span> is customizable text, don't try to parse inside the tag -->
       <a href="/wiki/User:cscott" title="User:cscott">C. Scott Ananian</a> (<a href="/wiki/User_talk:cscott" title="User talk:cscott">talk</a>)
     </span>
    <time property="dateCreated" datetime="2007-03-29T18:07Z">18:07, 29 March 2007 (UTC)</time>
</span>

?
(You could also add the userid as an attribute to the <span> if that was thought useful.)

Would this be generated by an extension tag or parser function, or would that HTML be directly saved into the page by the editor? The former approach would presumably be cleaner in terms of the wikitext source code.