Page MenuHomePhabricator

Display math generates div inside of paragraph (HTML5 violation)
Closed, ResolvedPublic

Description

According to HTML 5, the closing tag of p element can be omitted if it is followed by a div. However, <math display="block"> generates a div regardless of whether it is used inside a paragraph. This means the surrounding p element is implicitly closed, which is not desired.

Goal: The math rendering should only generate elements that can be used within a p element.

  • native MathML rendering with PHP: The new native rendering does not use any surrounding elements and just outputs the plain MathML element with the display attribute . (patch)
  • SVG or MathML via mathoid: proposed solution (demo)
  • LaTeXML rendering with LaTeXML (not wmf-deployed): Uses the same elements as SVG even though we stopped generating fallback images for LaTeXML recently. Thus, this does not need to be treated separately.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Izno updated the task description. (Show Details)
Aklapper renamed this task from Display math generates div inside of paragraph to Display math generates div inside of paragraph (HTML5 violation).Dec 5 2017, 11:32 AM

Ha, you beat me to this.

Just adding that tidy catches the attempt to wrap a p around the div, and changes it to a mess of unnecessary tags: <p></p><div> math </div><p></p>.

This mode should remove the p that occurs prior to the div, as well as the closing tag.

I think it would be better for the Math Extension not to create a div in the first place. There's no technical reason for using a div instead of a span here.

This comment was removed by Izno.

Div meets the intent of block math better. It might even be reasonable to mark it up as a paragraph in HTML 5.

Div meets the intent of block math better. It might even be reasonable to mark it up as a paragraph in HTML 5.

I emphatically disagree.

While display equations may be block content that is not always the case. A simple example are display equations that end with a text comma because they are part of a larger sentence structure.

For another example, the HTML5 spec considers MathML both phrasing and flow content, independently of whether it is inline or block.

Div meets the intent of block math better. It might even be reasonable to mark it up as a paragraph in HTML 5.

While display equations may be block content that is not always the case.

That's true. But if you mark up your <math> wikitext with <math display=block>, you as the author of a wikitext document are *way* obviously meaning for the math to display as a block. Are you familiar with that functionality? That's what this task is about. Arguing without that context or seemingly without that context seems to miss the point of the task.

A simple example are display equations that end with a text comma because they are part of a larger sentence structure.

See above.

For another example, the HTML5 spec considers MathML both phrasing and flow content, independently of whether it is inline or block.

"Phrasing" and "flow" are HTML 5's key words for inline or block (slightly different semantics), so that it considers them both is more an artifact indicating that they can appear either with or without a certain kind of parent node and not much else.

I emphatically disagree.

You can emphatically disagree, but you haven't said anything yet that doesn't indicate these shouldn't be div/p wrapped. (You could entirely unwrap it and just put in the HTML <img> / <math> without a wrapping div, but those elements would need to take a class which indicates the element is a block element so that it can be styled as appropriate.)

@Izno I do not know what the best way of implementing it in the extension is, but from an author point of view both are treated equally as part of a sentence and there is no difference between inline and block formula except for the size of the equation.

In principle you would want to put everything as inline formula, however line-breaking formulas is not as easy as line-breaking normal text and you want to avoid linebreaks. This is why you put a manual linebreak in front of large formulas and with that avoid the need of a line-break within the formula. This is why you would use a block formula instead of an inline formula. Now the artificial linebreak might look like a new paragraph to the reader. This is why you want the block formula indented or centered, i.e. to tell the reader that you do not want to start a new paragraph. (e.g. LaTeX default style uses no vertical space but indenting to mark a new paragraph and centering for block formulas).

Most browsers create an additional vertical space at the start or end of a paragraph <p>, which is not appropriate before or after a block formula, because the whole point of using a block formula is to show the readers that you do not want to start a new paragraph.

Addendum: Another reason for a block formula is of course if the formula requires too much vertical space to fit into a normally spaced line

But if you mark up your <math> wikitext with <math display=block>, you as the author of a wikitext document are *way* obviously meaning for the math to display as a block. Are you familiar with that functionality?

Yes, I'm aware of it.

display=block has a very specific meaning in equation layout that does not align much with CSS's display=block. In particular, this property is primarily meant to inform the layout of the children. For some more detail, see my answer at https://stackoverflow.com/questions/41512604/is-there-a-difference-between-display-inline-and-style-display-inline-on/41676041#41676041.

Anyway, all I'm saying is that the use of a div here should be considered a bug in the Math Extension; if only for the pragmatic reason the actual rendering engine (MathJax, which I worked on for a several years) expects a span as a wrapper.

Of course, other workaround are possible but I'm still planning to propose to change the div to a span in the Math Extension. Happy to rehash this discussion in more detail then.

I agree, semantically these are part of inline text and it's the semantics, not the appearance, that should drive our choice of div vs span. So there should be no div and no p, if it's possible (and I'm sure it is) to get the correct appearance with a span instead.

<span> with display:block is always an option of course. ;)

Physikerwelt changed the task status from Open to Stalled.Feb 9 2020, 1:34 PM
Physikerwelt removed a project: good first task.
Physikerwelt subscribed.

I am not sure what to do here. But I think it is not "a good first bug" until there are more specific instructions on how to transition from the current to the desired behavior.

I went looking for another extension that has similar requirements on its display. SyntaxHighlight is such an extension. Take a look at lines 404-430 of SyntaxHighlight.php.

@Izno thank you for the hint. I do not feel competent enough to judge if that is a good approach that should be copied. Howerever, if you think it is I would prefer to move the functionality to core and make both extensions use the same functionality.

I made a demonstration here, https://en.wikipedia.org/wiki/User:Jacobolus/math_block_example

In my opinion it is less important to preserve the semantic grouping of the "paragraph" from the mathematical text perspective as it is to generate markup which is (a) spec compliant and (b) useful for CSS stylesheets. Human readers are not carefully inspecting the markup and will not be affected by whether the <p> tag corresponds precisely to a mathematical paragraph or not. If one sentence flows across two <p> tags there is no practical problem.

It is a quirk of mathematical writing (shared by e.g. bug report comments including block markup listings, but not by most kinds of documents) that sentences can wrap around block-level formulas. But even so, block math formulas function at a visual level similarly to other block-level elements such as block lists, tables, text-interrupting figures, block quotations, block code listings, and so on. These are nearly always (including in properly typeset mathematics books, etc.) styled entirely separately from running text, with different padding, margins, indentation, width limits, etc. It doesn’t make sense to shoehorn them into the same HTML <p> element which is entirely designed around the display of ordinary prose.

In my opinion, the divs are fine as wrappers for block math, but they should not be wrapped in paragraph tags. I would recommend turning both:

(1)

Here is a formula,
<math display=block>x^2 + y^2 = 1,</math>
and here is text afterward.

and (2)

Here is a formula,

<math display=block>x^2 + y^2 = 1,</math>

and here is text afterward.

into the same generated markup, something like:

<p>Here is a formula,</p>
<div class="mwe-math-element"> ...</div>
<p>and here is text afterward.</p>

The current behavior generated from the unspaced version (1):

<p>Here is a formula,
<div class="mwe-math-element"> ...</div>
and here is text afterward.</p>

is a huge problem, because browsers instead treat it as:

<p>Here is a formula,</p>           <!-- implicitly closed -->
<div class="mwe-math-element"> ...</div>
and here is text afterward.         <!-- bare text node -->
<p></p>                             <!-- implicitly opened -->

This is broken (non-spec-compliant) markup. But more importantly, any CSS which is intended to style top-level paragraphs cannot target the trailing portion of the text, because it is put into a bare text node not part of any top-level element.

I hope this can be fixed ASAP, because the only current alternatives are to (a) manually reformat every bit of block math to version (2) with explicit blank lines, a workaround which at least displays correctly even if its generated markup is also incorrect, or (b) revert to using the bare <math> version and indenting using raw HTML, other tempaltes, or :s (i.e. improperly abusing definition lists, still the most common method on Wikipedia).

Change 904610 had a related patch set uploaded (by Physikerwelt; author: Physikerwelt):

[mediawiki/extensions/Math@master] Remove images from native MathML

https://gerrit.wikimedia.org/r/904610

Change 904610 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Remove images from native MathML

https://gerrit.wikimedia.org/r/904610

In a related example, if you explicitly add the following (invalid as written) raw HTML to a wiki page:

<p>Here is a list, <ul><li>one</li><li>two</li></ul> and some trailing text.</p>

Mediawiki turns this into:

<p>Here is a list, </p>
<ul><li>one</li><li>two</li></ul>
<p> and some trailing text.</p>

with two separate paragraphs and the block element as a separate top-level element in between. Browsers then have no trouble rendering this (no longer invalid) markup.

My proposal for a bug fix for block math in a paragraph would be for Mediawiki to do exactly the same thing: emit 2 paragraphs with a block-math div in between.

That is, I recommend turning:

<p>Here is an equation, <math display=block>...</math> and some trailing text.</p>

Into:

<p>Here is an equation, </p>
<div> ... rendered math SVG ... </div>
<p> and some trailing text.</p>

This cannot be done from the math extension as it does not know the context of the math element. So this can not be implemented in the near future.

As you see in the linked patch, in the new mode no elements besides the HTML tag math will be created.

This bug makes <math display=block>...</math> fundamentally broken, and encourages authors to prefer using :<math>...</math> for indentation even though that generates a definition list which causes problems for some screen readers, or alternately to jump through hoops to instead adopt a template workaround.

If this bug cannot be fixed it is a serious problem for every technical article on Wikipedia.

Wikipedia authors by and large don't care at all about (or want) MathML output per se. The SVG rendered by MathJax or KaTeX is an effective solution used widely around the web. But that SVG should not be wrapped in invalid HTML markup.

@Jacobolus Don't get me wrong. I, and the whole community group math, understand the problem. Formulating consensual goals for math in Wikipedia is precisely the purpose of that group.

However, here

That is, I recommend turning:

<p>Here is an equation, <math display=block>...</math> and some trailing text.</p>

Into:

<p>Here is an equation, </p>
<div> ... rendered math SVG ... </div>
<p> and some trailing text.</p>

you make an actionable proposal. My comment explains that this can not be implemented in the scope of the math extension.

Wikipedia authors by and large don't care at all about (or want) MathML output per se. The SVG rendered by MathJax or KaTeX is an effective solution used widely around the web. But that SVG should not be wrapped in invalid HTML markup.

My impression is that most people don't care as long as it looks good. We claim that the new rendering mode is faster, easier to maintain, looks better, and is much more accessible compared to the previous rendering mode. We will support old browsers with client-side solutions such as MathJax. You are most welcome to challenge that claim, as each challenge helps to improve further.

Is there a place where we can test a wide variety of formulas with the new rendering mode?

Incidentally, an apparent minor side effect of this bug is that when a block formula is formatted in wikimarkup as part of a larger paragraph (which is then broken into pieces in html by this bug), the "readable prose size" script used by Wikipedia's "Did you know" nomination process fails to count the parts of the paragraph after the formula. I think that switching to a span with display:block, as suggested above, would likely fix this.

Is there a place where we can test a wide variety of formulas with the new rendering mode?

@Stegmujo what do you think of a public demo?

@Stegmujo what do you think of a public demo?

In the meantime, you can test this on any WMF wiki. https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-rendering select MathML rendering mode.

By the way, the MathJax option is held due to an security issue https://phabricator.wikimedia.org/T354136

First of all, using the syntax <math display="block"> was a very bad decision both in terms of terminology (in typesetting, these are not called "block" but just "display" formulas) and usability, as it forces editors to use machine-oriented XML-like markup instead of something human-oriented wiki-like (could be at least just <math display>, like <math chem>, or better <dmath>, akin to TeX's tfrac/dfrac).

Second, display formulas must always be treated as parts of the surrounding paragraph, unless there are explicit paragraph breaks before or after them. This is how it's always been done in typesetting (and, correspondingly, TeX, which is a must-know for everybody trying to design anything showing mathematical expressions). This becomes obvious if you look at any text with paragraph indentation (the usual practice for printed books). See some random example, where clearly some lines after display formulas are not indented because they continue the paragraph, and some lines after formulas are indented because they start a new paragraph.

I don't know whether any Mediawiki websites use paragraph indentation (for online HTMLs or offline PDFs), but the same logic must be followed for marking paragraphs by vertical space instead of indentation. Namely, extra vertical space must not be inserted around display formulas if there's no paragraph break, and it must be inserted if there is a paragraph break.

That is, using <div> is definitely wrong. Silently adding </p> before and <p> after is also definitely wrong. Using a <span> with display:block within the same paragraph should be both semantically and presentationally correct. Blank lines must not be ignored but instead treated regularly as paragraph breaks.

Even though Mediawiki generally uses space for paragraph breaks, the vertical spacing above and below math formulas should be the same irrespective of where they are relative to paragraphs, and ideally should be relatively little. Extra vertical space around block elements like formulas is not obvious enough a signal of a paragraph break, and having inconsistent spacing doesn't look intentional, it just looks like a sloppy bug.

The current Wikipedia <math display=block> spacing is outrageously wide, to readers' detriment. It should be slimmed considerably, comparable to the spacing around bullet lists or comparable elements. (While we're here, the spacing around block quotations is also absurdly wide.)

I did propose <dmath> when this feature was originally developed. However, it was not supported by anyone, thus I had to drop it. We can add dmath as an alias for math display=block if that is useful. However, this is offtopic here and should be a new issue. Keeping the long standard form seems to be a good idea to me, though.

In the new implementation, no div or whatsoever block will be put. So this ticket can be closed once the new rendering is default.

Using a <span> with display:block within the same paragraph should be both semantically and presentationally correct. Blank lines must not be ignored but instead treated regularly as paragraph breaks.

So you say, if we replaced div with span it would look as close as $$ .. $$ in tex as one can get with SVG images? Do you have a visual demo for that? Codewise this could be changed without a lot of effort.

@Physikerwelt, that "standard form" with MathML markup is a good demonstration of what I meant by XML-like markup with all these <mrow> <munderover> as opposed to human-oriented TeX and wiki markup... Of course, it can be kept as a backup/compatibility option (like the rest of HTML), but the more difficult it is to type, the fewer editors will use it instead of the old (wrong) :<math>. This is indeed offtopic here, but the word block has already confused Inzo in the discussion above.

As for being as close as possible to $$ .. $$ in TeX, yes, this is exactly what I think the goal should be. In what form would you like me to provide a visual demo?

@MikhailRyazanov I suggest we continue the discussion about :<math> at T111712 and (T268922 summarizes the current status).

As for being as close as possible to $$ .. $$ in TeX, yes, this is exactly what I think the goal should be. In what form would you like me to provide a visual demo?

A demo for your poposed span solution. Otherwise I might implement a suboptional solution.

@Physikerwelt, I understand that you want a demo. My question was how should I provide it — I obviously can't change how Wikipedia articles are generated currently, so do you want me to take one of them, pack everything in a stand-alone HTML, manually change there all the relevant divs to spans and upload here? Or maybe there's a simpler way?

Yes. One or two formulae are enough. I used https://jsfiddle.net in the past. You have live preview and it is easy to share the result.

Communication and instructions could be better, but instead of dragging WP stuff into jsfiddle, I've created a demo in my enwiki sandbox: https://en.wikipedia.org/wiki/User:Mikhail_Ryazanov/sandbox/display_math_span, using current styles but with a manually inserted <span> around <math>\displaystyle in place of <math display="block">. Decreasing the current "outrageously wide" spacings can be discussed elsewhere (or adjusted to personal taste by user styles), but that at least will be possible with correctly formed HTML.

Thank you this is perfect for defining the goal. I would personally prefer if formulae were centered (or indented).

This is how it renders on my phone.

Screenshot_20240606-094523.jpg (2×1 px, 471 KB)

In the Wikipedia App it is indented.

Screenshot_20240606-094401.jpg (2×1 px, 398 KB)

On https://en.m.wikipedia.org/wiki/User:Mikhail_Ryazanov/sandbox/display_math_span they also appear without indentation or centering, which is of course not how it should be, but this might be caused by using too many spans with contradictory classes:

<span class="mwe-math-mathml-display">
 <span class="mwe-math-element">
  <span class="mwe-math-mathml-inline ...>...</span>
  <img class="mwe-math-fallback-image-inline ...>
 </span>
</span>

The proper solution solution should not have that outer span (manually inserted by me) but instead should have ...-display instead of ...-inline classes for the inner span and img:

<span class="mwe-math-element">
 <span class="mwe-math-mathml-display ...>...</span>
 <img class="mwe-math-fallback-image-display ...>
</span>

as I guess it will if you just change div to span, as proposed.

I suppose that you have access to the source code and test builds, so maybe it will be more productive if you try to implement the change and then share the results here and with those who can test for accessibility issues?

Change #1040625 had a related patch set uploaded (by Physikerwelt; author: Physikerwelt):

[mediawiki/extensions/Math@master] Use spans for displaystyle fallback images

https://gerrit.wikimedia.org/r/1040625

I suppose that you have access to the source code and test builds, so maybe it will be more productive if you try to implement the change and then share the results here and with those who can test for accessibility issues?

@MikhailRyazanov Indeed changing the source is simple as you can see here

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Math/+/1040625/2/src/MathMathML.php#495

The hard thing is testing with different setups and browsers.

@MSantos can the Content-Transformer-Team review the proposed solution (demo) ?

Note, I just updated the ticket description to align it with the words of the lastest HTML spec.

Physikerwelt changed the task status from Stalled to In Progress.Jun 9 2024, 10:01 AM

@Physikerwelt, thanks! As far as I can tell, the DOM and visual results are now correct. I probably can't help with more testing as is (beta.math.wmflabs.org apparently doesn't even have skins and mobile version), but since now you have a working implementation, I hope that the testing team can work with it...

Seems reasonable. Be careful not to put any HTML block-level content (<p> or <div> tags, eg, which could get hidden inside a caption or other metadata) inside your <span>, but otherwise using CSS to get the block-level display seems to work fine from our perspective.

Change #1040625 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Use spans for displaystyle fallback images

https://gerrit.wikimedia.org/r/1040625

Thank you, @cscott, for the quick review.

@MikhailRyazanov Further testing can be done in the official beta cluster https://en.wikipedia.beta.wmflabs.org . It will be available on all wikis after Thursday, 20th of June, 2 pm UTC.

Physikerwelt updated the task description. (Show Details)