Page MenuHomePhabricator

Display math generates div inside of paragraph (HTML5 violation)
Open, Stalled, Needs TriagePublic

Description

According to HTML 5, the only permitted content inside a p element is phrasing content. However, <math display="block"> generates a div placed inside of a p element, where a div is flow content.

This mode should remove the p that occurs prior to the div, as well as the closing tag.

Event Timeline

Izno updated the task description. (Show Details)
Aklapper renamed this task from Display math generates div inside of paragraph to Display math generates div inside of paragraph (HTML5 violation).Dec 5 2017, 11:32 AM

Ha, you beat me to this.

Just adding that tidy catches the attempt to wrap a p around the div, and changes it to a mess of unnecessary tags: <p></p><div> math </div><p></p>.

This mode should remove the p that occurs prior to the div, as well as the closing tag.

I think it would be better for the Math Extension not to create a div in the first place. There's no technical reason for using a div instead of a span here.

This comment was removed by Izno.

Div meets the intent of block math better. It might even be reasonable to mark it up as a paragraph in HTML 5.

Div meets the intent of block math better. It might even be reasonable to mark it up as a paragraph in HTML 5.

I emphatically disagree.

While display equations may be block content that is not always the case. A simple example are display equations that end with a text comma because they are part of a larger sentence structure.

For another example, the HTML5 spec considers MathML both phrasing and flow content, independently of whether it is inline or block.

Div meets the intent of block math better. It might even be reasonable to mark it up as a paragraph in HTML 5.

While display equations may be block content that is not always the case.

That's true. But if you mark up your <math> wikitext with <math display=block>, you as the author of a wikitext document are *way* obviously meaning for the math to display as a block. Are you familiar with that functionality? That's what this task is about. Arguing without that context or seemingly without that context seems to miss the point of the task.

A simple example are display equations that end with a text comma because they are part of a larger sentence structure.

See above.

For another example, the HTML5 spec considers MathML both phrasing and flow content, independently of whether it is inline or block.

"Phrasing" and "flow" are HTML 5's key words for inline or block (slightly different semantics), so that it considers them both is more an artifact indicating that they can appear either with or without a certain kind of parent node and not much else.

I emphatically disagree.

You can emphatically disagree, but you haven't said anything yet that doesn't indicate these shouldn't be div/p wrapped. (You could entirely unwrap it and just put in the HTML <img> / <math> without a wrapping div, but those elements would need to take a class which indicates the element is a block element so that it can be styled as appropriate.)

@Izno I do not know what the best way of implementing it in the extension is, but from an author point of view both are treated equally as part of a sentence and there is no difference between inline and block formula except for the size of the equation.

In principle you would want to put everything as inline formula, however line-breaking formulas is not as easy as line-breaking normal text and you want to avoid linebreaks. This is why you put a manual linebreak in front of large formulas and with that avoid the need of a line-break within the formula. This is why you would use a block formula instead of an inline formula. Now the artificial linebreak might look like a new paragraph to the reader. This is why you want the block formula indented or centered, i.e. to tell the reader that you do not want to start a new paragraph. (e.g. LaTeX default style uses no vertical space but indenting to mark a new paragraph and centering for block formulas).

Most browsers create an additional vertical space at the start or end of a paragraph <p>, which is not appropriate before or after a block formula, because the whole point of using a block formula is to show the readers that you do not want to start a new paragraph.

Addendum: Another reason for a block formula is of course if the formula requires too much vertical space to fit into a normally spaced line

But if you mark up your <math> wikitext with <math display=block>, you as the author of a wikitext document are *way* obviously meaning for the math to display as a block. Are you familiar with that functionality?

Yes, I'm aware of it.

display=block has a very specific meaning in equation layout that does not align much with CSS's display=block. In particular, this property is primarily meant to inform the layout of the children. For some more detail, see my answer at https://stackoverflow.com/questions/41512604/is-there-a-difference-between-display-inline-and-style-display-inline-on/41676041#41676041.

Anyway, all I'm saying is that the use of a div here should be considered a bug in the Math Extension; if only for the pragmatic reason the actual rendering engine (MathJax, which I worked on for a several years) expects a span as a wrapper.

Of course, other workaround are possible but I'm still planning to propose to change the div to a span in the Math Extension. Happy to rehash this discussion in more detail then.

I agree, semantically these are part of inline text and it's the semantics, not the appearance, that should drive our choice of div vs span. So there should be no div and no p, if it's possible (and I'm sure it is) to get the correct appearance with a span instead.

<span> with display:block is always an option of course. ;)

Physikerwelt changed the task status from Open to Stalled.Feb 9 2020, 1:34 PM
Physikerwelt removed a project: good first task.
Physikerwelt subscribed.

I am not sure what to do here. But I think it is not "a good first bug" until there are more specific instructions on how to transition from the current to the desired behavior.

I went looking for another extension that has similar requirements on its display. SyntaxHighlight is such an extension. Take a look at lines 404-430 of SyntaxHighlight.php.

@Izno thank you for the hint. I do not feel competent enough to judge if that is a good approach that should be copied. Howerever, if you think it is I would prefer to move the functionality to core and make both extensions use the same functionality.

I made a demonstration here, https://en.wikipedia.org/wiki/User:Jacobolus/math_block_example

In my opinion it is less important to preserve the semantic grouping of the "paragraph" from the mathematical text perspective as it is to generate markup which is (a) spec compliant and (b) useful for CSS stylesheets. Human readers are not carefully inspecting the markup and will not be affected by whether the <p> tag corresponds precisely to a mathematical paragraph or not. If one sentence flows across two <p> tags there is no practical problem.

It is a quirk of mathematical writing (shared by e.g. bug report comments including block markup listings, but not by most kinds of documents) that sentences can wrap around block-level formulas. But even so, block math formulas function at a visual level similarly to other block-level elements such as block lists, tables, text-interrupting figures, block quotations, block code listings, and so on. These are nearly always (including in properly typeset mathematics books, etc.) styled entirely separately from running text, with different padding, margins, indentation, width limits, etc. It doesn’t make sense to shoehorn them into the same HTML <p> element which is entirely designed around the display of ordinary prose.

In my opinion, the divs are fine as wrappers for block math, but they should not be wrapped in paragraph tags. I would recommend turning both:

(1)

Here is a formula,
<math display=block>x^2 + y^2 = 1,</math>
and here is text afterward.

and (2)

Here is a formula,

<math display=block>x^2 + y^2 = 1,</math>

and here is text afterward.

into the same generated markup, something like:

<p>Here is a formula,</p>
<div class="mwe-math-element"> ...</div>
<p>and here is text afterward.</p>

The current behavior generated from the unspaced version (1):

<p>Here is a formula,
<div class="mwe-math-element"> ...</div>
and here is text afterward.</p>

is a huge problem, because browsers instead treat it as:

<p>Here is a formula,</p>           <!-- implicitly closed -->
<div class="mwe-math-element"> ...</div>
and here is text afterward.         <!-- bare text node -->
<p></p>                             <!-- implicitly opened -->

This is broken (non-spec-compliant) markup. But more importantly, any CSS which is intended to style top-level paragraphs cannot target the trailing portion of the text, because it is put into a bare text node not part of any top-level element.

I hope this can be fixed ASAP, because the only current alternatives are to (a) manually reformat every bit of block math to version (2) with explicit blank lines, a workaround which at least displays correctly even if its generated markup is also incorrect, or (b) revert to using the bare <math> version and indenting using raw HTML, other tempaltes, or :s (i.e. improperly abusing definition lists, still the most common method on Wikipedia).

Change 904610 had a related patch set uploaded (by Physikerwelt; author: Physikerwelt):

[mediawiki/extensions/Math@master] Remove images from native MathML

https://gerrit.wikimedia.org/r/904610

Change 904610 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Remove images from native MathML

https://gerrit.wikimedia.org/r/904610

In a related example, if you explicitly add the following (invalid as written) raw HTML to a wiki page:

<p>Here is a list, <ul><li>one</li><li>two</li></ul> and some trailing text.</p>

Mediawiki turns this into:

<p>Here is a list, </p>
<ul><li>one</li><li>two</li></ul>
<p> and some trailing text.</p>

with two separate paragraphs and the block element as a separate top-level element in between. Browsers then have no trouble rendering this (no longer invalid) markup.

My proposal for a bug fix for block math in a paragraph would be for Mediawiki to do exactly the same thing: emit 2 paragraphs with a block-math div in between.

That is, I recommend turning:

<p>Here is an equation, <math display=block>...</math> and some trailing text.</p>

Into:

<p>Here is an equation, </p>
<div> ... rendered math SVG ... </div>
<p> and some trailing text.</p>

This cannot be done from the math extension as it does not know the context of the math element. So this can not be implemented in the near future.

As you see in the linked patch, in the new mode no elements besides the HTML tag math will be created.

This bug makes <math display=block>...</math> fundamentally broken, and encourages authors to prefer using :<math>...</math> for indentation even though that generates a definition list which causes problems for some screen readers, or alternately to jump through hoops to instead adopt a template workaround.

If this bug cannot be fixed it is a serious problem for every technical article on Wikipedia.

Wikipedia authors by and large don't care at all about (or want) MathML output per se. The SVG rendered by MathJax or KaTeX is an effective solution used widely around the web. But that SVG should not be wrapped in invalid HTML markup.

@Jacobolus Don't get me wrong. I, and the whole community group math, understand the problem. Formulating consensual goals for math in Wikipedia is precisely the purpose of that group.

However, here

That is, I recommend turning:

<p>Here is an equation, <math display=block>...</math> and some trailing text.</p>

Into:

<p>Here is an equation, </p>
<div> ... rendered math SVG ... </div>
<p> and some trailing text.</p>

you make an actionable proposal. My comment explains that this can not be implemented in the scope of the math extension.

Wikipedia authors by and large don't care at all about (or want) MathML output per se. The SVG rendered by MathJax or KaTeX is an effective solution used widely around the web. But that SVG should not be wrapped in invalid HTML markup.

My impression is that most people don't care as long as it looks good. We claim that the new rendering mode is faster, easier to maintain, looks better, and is much more accessible compared to the previous rendering mode. We will support old browsers with client-side solutions such as MathJax. You are most welcome to challenge that claim, as each challenge helps to improve further.

Is there a place where we can test a wide variety of formulas with the new rendering mode?

Incidentally, an apparent minor side effect of this bug is that when a block formula is formatted in wikimarkup as part of a larger paragraph (which is then broken into pieces in html by this bug), the "readable prose size" script used by Wikipedia's "Did you know" nomination process fails to count the parts of the paragraph after the formula. I think that switching to a span with display:block, as suggested above, would likely fix this.

Is there a place where we can test a wide variety of formulas with the new rendering mode?

@Stegmujo what do you think of a public demo?

@Stegmujo what do you think of a public demo?

In the meantime, you can test this on any WMF wiki. https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-rendering select MathML rendering mode.

By the way, the MathJax option is held due to an security issue https://phabricator.wikimedia.org/T354136