Page MenuHomePhabricator

Parsoid: References should be wrapped in a <sup>, not a <span>
Closed, ResolvedPublic

Description

Canonical HTML (from PHP parser):

<sup id="cite_ref-Forty_47-0" class="reference">
  <a href="#cite_note-Forty-47"><span>[</span>47<span>]</span></a>
</sup>

Output from Parsoid:

<span id="cite_ref-Forty-46-0" class="reference" about="#mwt408" typeof="mw:Object/Ext/Cite" data-parsoid="…">
  <a href="#cite_note-Forty-46">[47]</a>
</span>

I have no idea if the inner spans have any value, but the outer span should be a sup element instead.

(Also, the names of the cites are indexed from 0 in names but 1 in label - 46 vs. 47, whereas in PHP parser they're indexed from 1 in both; not sure if this is a problem.)

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:44 AM
bzimport set Reference to bz43094.

About sup and sub in the HTML5 spec:

"These elements must be used only to mark up typographical conventions with specific meanings, not for typographical presentation for presentation's sake. For example, it would be inappropriate for the sub and sup elements to be used in the name of the LaTeX document preparation system. In general, authors should use these elements only if the absence of those elements would change the meaning of the content."

This does not seem to be the case here, and styling is more flexible with spans. Since we will require some CSS adjustments anyway we might as well clean this one up too.

Closing as WONTFIX for that reason.

kaldari subscribed.

I have to disagree with @GWicke's comment above. Using <sup> for reference numbers is semantically correct. Reference numbers are considered a standard type of superscript notation (like asterisks and daggers). Even if we didn't present them as such, they should still be defined as superscript content. In fact, I can't really think of any use of <sup> that is more semantically correct than using it for footnote numbers (which is probably why this use is given as an example of proper use of the <sup> tag at places like w3schools.com).[1]

  1. http://www.w3schools.com/tags/tag_sup.asp
matmarex set Security to None.

Even if we didn't present them as such, they should still be defined as superscript content.

This seems to agree with the notion that it would *not* change the meaning of the content if they were not rendered as superscript.

Either way, I think this is a border line case with both options being somewhat sub-(or sup?)optimal. I wished something like <cite> had semantics closer to what we are looking for.

I think <sup> is reasonable; you can always use CSS to remove the superscript-ness, but using <sup> means that naïve content consumers (ie, with the default HTML stylesheet) will display things in a reasonable manner.

@Esanders, @santhosh, @Catrope, @mattflaschen any concerns / thoughts about making this change from your perspectives? Whenever we do this, it will be a HTML version bump on the Parsoid end.

Superscript for references appears to be a style guide recommendation that varies from place to place, rather than an agreed notation, like CO<sub>2</sub>, or E=mc<sup>2</sup> - there doesn't even seem to be agreement on square vs round brackets.

@Esanders, @santhosh, @Catrope, @mattflaschen any concerns / thoughts about making this change from your perspectives? Whenever we do this, it will be a HTML version bump on the Parsoid end.

If VE will support both old and new format, it will be okay. I don't think we do anything with Parsoid versions yet.

@Jdforrester-WMF Tackling T114256 makes me think maybe I should do this as well. Will this change break VE? Anything to coordinate?

@Jdforrester-WMF Tackling T114256 makes me think maybe I should do this as well. Will this change break VE? Anything to coordinate?

Hmm. Right now Cite's VE code will match anything with mw:Extension/ref (https://github.com/wikimedia/mediawiki-extensions-Cite/blob/master/modules/ve-cite/ve.dm.MWReferenceNode.js#L45) but will emit a <span> (https://github.com/wikimedia/mediawiki-extensions-Cite/blob/master/modules/ve-cite/ve.dm.MWReferenceNode.js#L107); if Parsoid will cope with spans and convert them to sups it should work fine, and then we can convert Cite over to emit only spans once production Parsoid is fully using the new node.

Change 398616 had a related patch set uploaded (by Sbailey; owner: Sbailey):
[mediawiki/services/parsoid@master] T45094 span to sup change

https://gerrit.wikimedia.org/r/398616

Change 401371 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Support references wrapped in <sup> instead of <span>

https://gerrit.wikimedia.org/r/401371

Change 401501 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/Cite@master] Allow mw-ref rules apply to any element

https://gerrit.wikimedia.org/r/401501

It looks like Parsoid already ingests any tag with the correct RDFa attribute, so we can switch over VE default output immediately.

Change 401503 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/Cite@master] MWReferenceNode: Generate 'sup' tags by default

https://gerrit.wikimedia.org/r/401503

Change 401501 merged by jenkins-bot:
[mediawiki/extensions/Cite@master] Allow mw-ref rules apply to any element

https://gerrit.wikimedia.org/r/401501

Change 401371 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Support references wrapped in <sup> instead of <span>

https://gerrit.wikimedia.org/r/401371

Change 401503 merged by jenkins-bot:
[mediawiki/extensions/Cite@master] MWReferenceNode: Generate 'sup' tags by default

https://gerrit.wikimedia.org/r/401503

Change 398616 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Fix for T45094 replaces <span> with <sup> for references

https://gerrit.wikimedia.org/r/398616