Page MenuHomePhabricator

<math> tags duplicate content in section headers when MathML is used
Closed, ResolvedPublic

Description

When <math> tags are used in section headers, the header text is duplicated without math tags when using MathML.
See https://en.wikipedia.org/wiki/User:Sn1per/MathML_Issue for a live example.

wikimarkup:

Testing math tags
==The fraction is <math>\frac{1}{2}</math> of the whole==
===The letter <math>\mathbb{N}</math> is unique===
====I am writing nonsense sentences with <math>\mathbb{E}</math>s in them====

Event Timeline

Sn1per raised the priority of this task from to Needs Triage.
Sn1per updated the task description. (Show Details)
Sn1per added a subscriber: Sn1per.
Sn1per renamed this task from Bug with <math> tags in section headers? to Bug with <math> tags in section headers? (Google Chrome).Jun 21 2015, 4:22 PM
Sn1per updated the task description. (Show Details)
Sn1per set Security to None.
Sn1per updated the task description. (Show Details)
Sn1per renamed this task from Bug with <math> tags in section headers? (Google Chrome) to Bug with <math> tags in section headers when using MathML.Jun 21 2015, 4:25 PM
Sn1per updated the task description. (Show Details)
Ciencia_Al_Poder renamed this task from Bug with <math> tags in section headers when using MathML to <math> tags duplicate content in section headers when MathML is used.Jun 21 2015, 4:29 PM

IE is not affected because it can't support MathML, so it only renders one :)

HTML output of one of the headings:

<h2><span class="mw-headline" id="The_fraction_is_.0A_.0A_.0A_.0A_1.0A_2.0A_.0A_.0A_.0A_of_the_whole">The fraction is <span><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML">
  (math ML content)
</math></span>
<meta class="mwe-math-fallback-image-inline" style="..." aria-hidden="true" /></span> of the whole</span>The fraction is 1 2 of the whole</h2>

Note that the MathML heading is inside a <span class="mw-headline"> and then, outside the span, there's the same heading rendered as plain text

@Aklapper Can you add a second project. From within the Math extension it can not be said if an element is in a heading or not.

Physikerwelt added a subscriber: GWicke.

I wonder why the TOC is correct only for the MathML mode. If that could be figured out, it would be proably possible to find out why the actual headline is doubled.

@Jdforrester-WMF: Do you have a suggestion how to direct someones attention to this bug. The problem is that the math extenson does not even know if the element that it's rendering is a heading element or not.

I don't believe that this is a problem on a firefox either. Can someone who has firefox check and confirm?

I don't believe that this is a problem on a firefox either. Can someone who has firefox check and confirm?

I can confirm:

hi @Sn1per
What version of chrome are you using? I am trying to replicate this bug on my firefox and also on many different virtual browsers from browserstack.com and can't seem to replicate the problem.

hi @Sn1per
What version of chrome are you using? I am trying to replicate this bug on my firefox and also on many different virtual browsers from browserstack.com and can't seem to replicate the problem.

Note that you have to set in preferences for math:

  • MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools)

I am using CentOS too if that makes a difference. Is there a way I can check to see if MathML is enabled on my browser? Also @Ciencia_Al_Poder what version of firefox are you using and what OS?

Does this task need closing or is this still a problem or has it been resolved.

The problem seems to be that the parser also replaces markers in the mw-editsection. If we comment out the actual mathml rendering, we get the following DOM tree:

<h2>
  <span class="mw-headline" id="The_fraction_is_.7FUNIQ--postMath-00000002-QINU.7F_of_the_whole">The fraction is UNIQ--postMath-00000002-QINU of the whole</span>
  <span class="mw-editsection">
    <span class="mw-editsection-bracket">[</span>
    <a href="/w/index.php title=Main_Page&amp;action=edit&amp;section=1" title="Edit section: The fraction is UNIQ--postMath-00000002-QINU of the whole">edit</a>
    <span class="mw-editsection-bracket">]</span>
  </span>
</h2>

So basically everything is fine here. But If we render the math, we get

<h2>
  <span class="mw-headline" id="The_fraction_is_.7FUNIQ--postMath-00000002-QINU.7F_of_the_whole">The fraction is
  <!-- some big span generated by mathml -->
  <mw:editsection page="Main Page" section="1">
    The fraction is <!-- the same (?) big span --> of the whole
  </mw:editsection>
</h2>

So there is done something wrong when replacing the markers in the mw-editsection. Apparently, the rendering will be only done once per formula, so currently, the math extension can't do something different for the mw-editsection. Maybe the third argument $frame passed to the ParserFirstCallInit hooks could help here... Otherwise, the problem is *somewhere* in the parser...

Mh even more interesting, is what you don't see in this sample. The Marker prefix is defined as

const MARKER_PREFIX = "\x7fUNIQ-";

XML validation fixes the id attribute so that it's .7FUNIQ . In the title attribute of the link escaping does not seem to happen and the \x7f remains in the attribute.
Therefore, the span is inserted a second (i.e. third) time.

Change 273033 had a related patch set uploaded (by Physikerwelt):
Use TeX representation for the editsection title

https://gerrit.wikimedia.org/r/273033

Change 273033 merged by Mobrovac:
Use TeX representation for the editsection title

https://gerrit.wikimedia.org/r/273033

Physikerwelt claimed this task.

Given the bug discoverd in maths handling of strip markers https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Math/+/655884/8/src/MathHooks.php#355 we should re-investigate if this is still needed.