Page MenuHomePhabricator

<references/> list item must not wrap the text in <span>
Open, LowPublic

Description

Currently, the code of reference item in reference list is:

<li id="cite_note-1">
  <span class="mw-cite-backlink"><a href="#cite_ref-1">↑</a></span>
  <span class="reference-text">Lorem ipsum.</span>
</li>

This is wrong, because it disallows block elements in references. (In fact, if one puts the block elements there, they are being converted by Tidy in very unpredictable way.)

The easiest solution is to remove the <span class="reference-text"> wrapper.


Whiteboard: usability

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 276183 had a related patch set uploaded (by Aashaka):
Use div element instead of span around reference text

https://gerrit.wikimedia.org/r/276183

Jdforrester-WMF subscribed.

Not allowing block elements in references is absolutely intentional; I'm Declining this.

Tagging major functional changes like this as "Easy" is pretty disrespectful to people, given that it's a huge change and reversing a very long-standing intentional choice.

Sorry that I didn't see this until now, especially to @Aashaka.

Change 276183 abandoned by Jforrester:
Use div element instead of span around reference text

Reason:
Sorry you spent time on this; the task shouldn't have been tagged as "Easy", and I've not Declined it.

https://gerrit.wikimedia.org/r/276183

The problem is that cite does allow block elements in references, and then breaks the output by wrapping them in a span.

I said in the task that was merged into here, that cite should either support block elements or not, not just break.

jayvdb subscribed.

Not allowing block elements in references is absolutely intentional; I'm Declining this.

What the ... ?
Block elements have been allowed in Cite's ref tag, and it is quite common for this to be used in 'Notes' on Wikipedia, and of course on Wikisource we do not have a say in when a published book chooses to use a blocky reference.

Here is an example of where a real academic work dares to use block elements in references:
https://en.wikisource.org/wiki/The_Descent_of_Man_(Darwin)/Chapter_II#cite_note-13
https://en.wikisource.org/wiki/Page%3ADescent_of_Man_1875.djvu/45

In that example, the wikitext is quite sophisticated, and the block elements are moved outside the <span class="reference-text">...</span>.
There are many simpler examples available on Wikisource if we want to see how the html renders.

Tagging major functional changes like this as "Easy" is pretty disrespectful to people, given that it's a huge change and reversing a very long-standing intentional choice.

Well, the code does look quite easy...

For an example on en.wp, see Barack Obama#Notes 11, 13, 14, and etc. This use is increasingly common.

Change 276183 restored by Legoktm:
Use div element instead of span around reference text

Reason:
Per Danny B's request

https://gerrit.wikimedia.org/r/276183

Per discussion above I've requested restoring of the patch for the time being.

I think that the current user-level behavior will block moving to a strict HTML 5 parsing model, though there may be a more appropriate parent task. I don't much care how this task gets fixed (whether by some other work around or by allowing the suggested patch to move forward), but I expect that Bad Things Will Happen if the user-level behavior of including block tags inside of <ref> is disturbed.

What blocks the current patch besides "it was deliberately designed that way"? I seriously doubt that statement--the majority of changes I've seen made to Cite indicates it has not been a carefully designed extension.

This does need to be fixed; people *all the time* use block elements in reference citations, most commonly paragraph breaks and short lists, though I've also encountered reasonable uses of small tables, block quotations, divs that do block indentation of material that is not a direct quotation, and so on. This is not a mistake or bad user behavior, it is editors doing what they need to do in references and non-reference footnotes (which also pass through the <ref> system, with templates (at en.Wikipedia) like {{efn}}, etc.).

The two most obvious solutions to me are a) move the span class(es) to the surrounding li, since the span(s) only seem to exist as carriers of the class(s); or b) convert the span(s) to div(s) styled to not induce extraneous linebreaks, but only take this route if there's an important reason to retain these containers that are presently spans.

Once Tidy is replaced with RemexHtml (T89331: Replace HTML4 Tidy in MW parser with an equivalent HTML5 based tool), I expect this will be fixed. Let us revisit after that happens.

Once Tidy is replaced with RemexHtml (T89331: Replace HTML4 Tidy in MW parser with an equivalent HTML5 based tool), I expect this will be fixed. Let us revisit after that happens.

I would recommend against closing it even then, regardless, until each list item supports an internal block structure, given that we have ample examples of users using blocks inside the list items. (That might be removing the problematic span, or that might be changing the span to a div.)

Once Tidy is replaced with RemexHtml (T89331: Replace HTML4 Tidy in MW parser with an equivalent HTML5 based tool), I expect this will be fixed. Let us revisit after that happens.

The output of ref 11 in my Barack example is:

<span class="mw-cite-backlink">...</span> 
<span class="reference-text">Obama (1995, 2004), pp. 9–10.
    <ul>
        <li>Scott (2011), pp. 80–86.</li>
        <li>Jacobs (2011), pp. 115–118.</li>
        <li>Maraniss (2012), pp. 154–160.</li>
    </ul>
</span>

<span> has as its content model "phrasing content", whereas <ul> has its content model "flow content". This means that this is not conformant HTML, but at least the entire list is now wrapped.

I've added some other testing on my user page; in general, I don't think any of them are the expected behavior, which is that the list element be presented next to the backlink, but I guess that's probably because the sublist is a block itself and isn't floating next to the backlink span (or the backlink span floating left).

Keep the existing functionality and have <footnote> <footnote follow=> </footnote> instead to handle the block level cases?

I'm also wondering if <footnotes> collation should use CSS counters and ::before ::after selector in a style sheet on a .refeference class block instead of the current formatted list arrangement. However I would have concerns about that approach given that ::before selectors can be used to insert unwanted sequences.

In respect of the discussion at English Wikisource this is also slightly related to: T226395

Another example of where wrapping references in a span is not a good thing-
https://en.wikisource.org/wiki/Robert_the_Bruce_and_the_struggle_for_Scottish_independence/The_Making_of_Scotland

This didn't even generate a warning from linthint when the underlying pages were examined more closely (this suggests a new linter is needed that is aware of what goes inside REF tags.)

From the parsing perspective, I have no problem with making the wrapper a <div> and using CSS to style this as inline (or whatever the desired appearance is). Whatever CSS styles you use is out of my hands.

The alternatives (solely from the parsing perspective) is to either (a) "magically" figure out whether the contents are inline or block and use the appropriate wrapper (this seems to be the expected behavior of some language converter constructs), or (b) declare the contents as one type or the other (say, inline) and enforce this by stripping out bad stuff (say, block constructs). A variant of (b) declares the contents as block, which requires less stripping, but presumably more styling. I don't know whether CSS can handle the "sometimes inline, sometimes block" nature of the contents any better than the HTML parser can.

That said, I'm not weighing in on whether "no block content in refs" is actually a Deliberate Feature or an Accidental Bug. That's for others to decide.

So... why are we not simply replacing the span with a div? We've shown several use cases where span causes problems, and I can't even think of any where div causes problems

So... why are we not simply replacing the span with a div? We've shown several use cases where span causes problems, and I can't even think of any where div causes problems

The problem is that the patch that does that is blocked by a big fat veto (-2) in Gerrit referencing the reason «Not allowing block elements in references is absolutely intentional» above. I have no idea how to resolve that since the mentioned decision is not documented anywhere that I can find, and thus neither are the concerns playing into that decision. Consequently it is impossible to attempt to find a solution that would satisfy those concerns. I can think of no fundamental reason why a <div> would not work here, just some "minor-in-the-grand-scheme" pitfalls just when it rolls out.

@thiemowmde You seem to be the one making most changes to Extension:Cite lately. Any ideas on how we can progress this? It's a relatively rare problem on the Wikipedias (in fraction of 6M articles, most things become relatively rare), but for the Wikisources it's a right royal pain.

If in respect of Wikipedia the non-provision is intended then I can see the reasoning.. In which case alternative functionality should be provided as not will wikis are Wikipedia.

Changing :

<ref>{{block}}</ref>

to

<ref tag=div> {{block}} </ref>

Would be straightforward to do on Wikisource, changing the entire formatting of already validated works isn't.

Of course if puesdo-elements could be styled with CSS, then it would be EVEN more powerful

Even if my WMDE-TechWish is currently working on the Cite extension as part of the Cite-Extends project, we are in no way in a position to touch anything that can potentially break millions of existing pages.

I feel the situation was a lot different back in 2013. It's 2020. We are not talking about XHTML any more. We use HTML 5, a "living standard". At this point, is this an academic discussion about the semantics of HTML, or does this have real-life consequences? If so, in which browsers? Is there a specific example we can look at? (Ideally with a revision ID and screenshot.)

I feel the situation was a lot different back in 2013. It's 2020. We are not talking about XHTML any more. We use HTML 5, a "living standard". At this point, is this an academic discussion about the semantics of HTML, or does this have real-life consequences? If so, in which browsers? Is there a specific example we can look at? (Ideally with a revision ID and screenshot.)

If it has no real-life consequences, then the change should not be blocked. :)

Either we respect the 'living standard' or we don't; however, these are semantics that have not changed from before nor going into the 'living standard' nor in the living standard since. (See also T49544#5125976 and related discussion.)

You shouldn't be responsible for unblocking this task nor working on it. Mr. Scott for parsing has already said that this change is basically fine; it seems to be one holdout who has not actually explained why ever since he made the original decline. That person should do his job explaining instead of putting up the stonewall he did.

Not breaking existing handling, which would affect many many wikis (not just English Wikisource) is something I can understand the technical reasoning for.

I noted the DIV embedded in a span issue here - https://en.wikisource.org/wiki/King_Solomon%27s_Mines/Chapter_XVI but it doesn't get reported when examining the individual pages, which are transcluded. This led to a suggestion on English Wikisource that the issue was the Linter extension that was in error.

Would it be possible to fork the 'modified' handling code into a different extension, (so that it could be implemented as a different tag, something I suggested earlier?) Diverging what is essentially footnote support (vs citation support) would also potentially allow the ongoing Cite extension code to be simplified (removal of follow= logic for example).

(I have my suspicions that the Poem extension code may also be interacting in an unexpected way in respect of the example above... Replacing the Poem tag approach with something that's more compliant with current HTML/CSS practice (I seem to recall there are some CSS3 feature regarding line-feed handling) is a different ticket entirely..)

Oh boy. This is not how you are going to win this Phabricator game. Changing the <span> to a <div> will break stuff. Some we know about, and a lot more we don't know about. On the other hand, demanding a change and the same time refusing to explain it's real-life benefits is doing nothing but wasting everybodys time. Closing this ticket with no action is the only reasonable consequence then.

Insulting @Jdforrester-WMF for not doing his job when what he did is exactly his job is not going to help either.

I feel the situation was a lot different back in 2013. It's 2020. We are not talking about XHTML any more. We use HTML 5, a "living standard". At this point, is this an academic discussion about the semantics of HTML, or does this have real-life consequences? If so, in which browsers? Is there a specific example we can look at? (Ideally with a revision ID and screenshot.)

@thiemowmde There are examples provided in previous comments on this task. HTML5 does not permit block level elements (Flow content) inside inline elements (Phrasing content; it just uses different terminology for the same concept), and consequently the linter complains about them and the parser attempts to correct them (by inserting closing tags and inferring paragraphs and whatever else it's doing; I'm not up to date on that).

This did indeed improve after switching to remex (I'm not aware of any outright breakage now, unlike the Tidy days), but we're still living at the mercy of whatever the parser/linter/adjacent parts of the stack decides to do: we're dependent on error correction. So… partly academic, but in a way that experience shows us tends to have real-world undesirable consequences?

Incidentally, what are the things known to break if this switched to a div? In the 7 years this task has been open, this is the first time that's been mentioned.

I was very specifically asking for an example of bad rendering in a browser.

There is a number of tools, gadgets, user script, and whatnot that work with references. If any of these contains a selector that requires the element name span, this tool is going to break without notice.

I was very specifically asking for an example of bad rendering in a browser.

@Danny_B @SMcCandlish @Izno @beleg_tal @ShakespeareFan00 I have not banged my head against this particular problem since the Tidy days (and Tidy behaved very differently then RemexHtml in this regard). Do you have any current example of a modern mainstream web browser that visibly misrenders a ref due to this? Or causes problems for screen readers or other accessibility tools? My own test cases (culled from what I recall to have been problematic historically) do not indicate any actual breakage in current MediaWiki. Is this still a current problem or has it faded into more of an "academic" one?

(NB! that the linter emits warnings for this is not likely to be in itself sufficient reason to make a change of this magnitude)

There is a number of tools, gadgets, user script, and whatnot that work with references. If any of these contains a selector that requires the element name span, this tool is going to break without notice.

@thiemowmde Overly specific selectors, including particularly the element name, are bad practice in any case, so if there are any such (which would surprise me), them breaking is to be expected for all sorts of reasons. However, the current output is (by both HTML4 and HTML5 rules) invalid, and the net result is that .reference-text gets a computed geometry of 0px x 0px (if I recall the error handling algorithm in HTML5 correctly, what happens is that since the div is not permitted inside the span it gets fostered to the nearest parent that permits it, making the span empty and thus collapses to a point). You can check this on any article (such as w:en:Barack Obama that was mentioned in a previous comment) with $(".reference-text").css("background", "red") (or otherwise inspect the computed geometry; jQuery's .height() and .width()). Empty inline elements behave really unpredictably in regards size and position (the "living standard" fails to specify these computations in detail for empty inline elements; implementations are therefore just as variable as you might expect).

First off, the output of both Parsoid as well as Parser.php are both not HTML5 compliant. https://www.mediawiki.org/wiki/Parsing/Notes/HTML5_Compliance has some notes I had started compiled from when we were replacing Tidy with RemexHtml. As that page makes clear, it is non-trivial to generate HTML5-compliant output.

Let us please not make changes like changing <span> to <div> without thinking through the implications. Scott is right that making changes from the parser / cite end is straightforward. But, the problem is not with making that change, but with handling all the downstream implications of that change (rendering, gadgets, bots, etc.). As a point of comparison, one of the steps along the way of making Parsoid and Parser.php output match up is to change Parser.php output for media to use the <figure> tag instead of <div> Now, from a parser point of view, it is a trivial change and the patchset for it has existed for a long time ( https://gerrit.wikimedia.org/r/#/c/196532/ ), but it had to go through an RFC ( T118517 ) and the change hasn't been merged yet because we have to handle the breakage from making this change and this is something we are going to deal with this year.

I imagine the situation with Cite's span-vs-div wrapper issue is similar and let us be intentional and careful before making such changes. I am not saying we should not do it, but just saying that we should have a coherent plan for dealing with any downstream breakage, and this possibly might also benefit from an RFC because this is going to be a major change.

Thanks everyone for weighing in.

@Izno: Hi, please take a look at our etiquette and keep conversations respectful. "Criticize ideas, not people", basically. Thanks a lot!

There is a number of tools, gadgets, user script, and whatnot that work with references. If any of these contains a selector that requires the element name span, this tool is going to break without notice.

I ran a search on both English Wikipedia, Meta, and some others searching for every instance of "span.reference-text". (result1, result2, result3, result4, result5, & result6) [The prize goes to es.wiki with a whopping 0 results.]

Result: I only found a handful of examples of pages that mention the "span.reference-text" selector. Where it is actively used on-wiki seems trivially amendable. The only thing my search would not turn up is any examples of external tools which could potentially reference that CSS selector (but I doubt it will be very many given the small amount of results found on-wiki as a representative sample).

Let us please not make changes like changing <span> to <div> without thinking through the implications.

@ssastry I don't disagree. What I'm saying is that bad markup should be fixed; so what we're actually discussing is how best to do that and with what priority. "The fix will break lots of other stuff" + "The problem currently has very little practical impact" = probably not worth it in isolation. Bigger practical impact or an absence of undesirable side-effects changes the equation.

So far we have the broken geometry and the linter errors cluttering up maintenance logs on the impact side; and a concern about potential for breaking automated tools on the side-effects side.

I am personally not convinced the probability of significant side-effects is all that high: there's a world of difference between the switch from span to div for the text content of footnotes and the introduction of figure, and nobody should be relying on the element name to begin with (it is bad practice precisely because the element will often change for various reasons). However, if the borked geometry and linter errors are all the impacts there are, the equation still looks kinda lopsided.

Not allowing block elements in references is absolutely intentional; I'm Declining this.

Tagging major functional changes like this as "Easy" is pretty disrespectful to people, given that it's a huge change and reversing a very long-standing intentional choice.

Sorry that I didn't see this until now, especially to @Aashaka.

@Jdforrester-WMF

Do you need examples of Wikisource complex references that we are trying to reproduce?

Also half of this issue is that this produces lint errors, even though the components seem to suitably display. So even excluding these types of errors from the special:lint bits would stop some noise.

[ Apparently I didn't post this at some point back in January when I wrote it ]

<ref tag=div> {{block}} </ref>

@thiemowmde For the Wikisources, having some way to actively request a div instead of span wrapper for .reference-text would probably be sufficient (even if not ideal). Would that be a more feasible approach than changing the default markup output ditto?

I came across this enwiki citation, and I wanted to share. I guess it happens there, too.

@Xover: It's not just a DIV vs span ...

For certain things to work as intended you would need "<div>" and then a 'soft' newline at the start, and a 'soft' newline followed by the closing DIV at the end, for reasons to do with how Mediawiki currently wraps P tags around things. ( see also TT134469)

A 'soft' newline would be an '\n' on its own as opposed to the hard 'newline' a pargraph break or explicitly given BR).

Another linter/validator where this bug is showing is with epubcheck on epubs generated with WS Export. Ereaders are sometimes much more picky about markup. This error can happen e.g. due to <ref>{{block center}}</ref> here:

ERROR(RSC-005): The_Kiss_and_its_History.epub/OPS/c7_The_Kiss_and_its_History_Chapter_7.xhtml(193,282): Error while parsing file: element "div" not allowed here; expected the element end-tag, text, element "a", "abbr", "area", "audio", "b", "bdi", "bdo", "br", "button", "canvas", "cite", "code", "data", "datalist", "del", "dfn", "em", "embed", "epub:switch", "i", "iframe", "img", "input", "ins", "kbd", "label", "link", "map", "mark", "meta", "meter", "ns1:math", "ns2:svg", "object", "output", "picture", "progress", "q", "ruby", "s", "samp", "script", "select", "small", "span", "strong", "sub", "sup", "template", "textarea", "time", "u", "var", "video" or "wbr" (with xmlns:ns1="http://www.w3.org/1998/Math/MathML" xmlns:ns2="http://www.w3.org/2000/svg") or an element from another namespace

<li xmlns:epub="http://www.idpf.org/2007/ops" id="cite_note-4-5fd1befa0e51c" epub:type="footnote"><span class="mw-cite-backlink"><a href="#cite_ref-4-5fd1befa0ddb1"></a></span> <span class="reference-text"><div style="display:table; position:relative; margin:0 auto; width:auto;">
<p>Madame, join the dancing throng,<br/>
Listen to their measured song;<br/>
But remember, for the rest,<br/>
You shall kiss whom you love best.<span style="display:inline-block; width:2em;"></span>W. F. H.
</p>
</div></span>
</li>

Should we fix this by changing it in wsexport, or is a change likely in MediaWiki? I'm happy to work on adding the <ref tag="div"> idea if that's what's wanted.

The biggest issue with doing this is how the interpolation of, e.g. cite_references_link_one works:

<li id=\"$1\"$4><span class=\"mw-cite-backlink\">[[#$2|↑]]</span> $3</li>

This means that it is very difficult to make $3 a div, and also have it share a line with the mw-cite-backlink element robustly (it can work with display:inline; but not if the reference begins with something like <p>.

So you often see the backlinks on their own lines.

Change 703397 had a related patch set uploaded (by Inductiveload; author: Inductiveload):

[mediawiki/extensions/Cite@master] WIP: Allow configuration of block-based references

https://gerrit.wikimedia.org/r/703397

This means that it is very difficult to make $3 a div, and also have it share a line with the mw-cite-backlink element robustly (it can work with display:inline; but not if the reference begins with something like <p>.

So you often see the backlinks on their own lines.

That couldn't be fixed using display:flex; and adding $3 between <div> tags?

That couldn't be fixed using display:flex; and adding $3 between <div> tags?

It's tricky to do that without causing a ragged left margin to the reference-text blocks:

2021-07-06_141443_636x549_screenshot.png (549×636 px, 117 KB)

This means that it is very difficult to make $3 a div, and also have it share a line with the mw-cite-backlink element robustly (it can work with display:inline; but not if the reference begins with something like <p>.

So you often see the backlinks on their own lines.

That's a very small price to pay. Edtors are already habituated to doing

Blah blah blah.<p>Foo bar baz.</p>

or

Blah blah blah.{{pb}}Foo bar baz.

or

Blah blah blah.

Foo bar baz.

Since <p> literally means "begin a new paragraph here", people understand that starting with one is going to cause a paragraph line break there. If they see broken-looking ref citations, the cause of that will be pretty obvious and easy to fix.

Also, the number of refs broken by starting with <p> is going to be dwarfed by the number of refs simply containing <p> or some other block-level item, like <blockquote>, all of which are presently broken by being span-wrapped.