MathML with input <math>hello</math> causes tidy to die
Closed, DeclinedPublic

Description

If the user selects MathML rendering for texvc, input text such as
<math>hello</math> will generate what tidy considers to be XML with "serious
errors". This causes severe corruption of the page display, on all those
discussion pages where people are using tidy-dependent signatures.


Version: 1.6.x
Severity: normal
URL: http://en.wikipedia.org/w/index.php?title=Wikipedia:Requests_for_adminship/Oleg_Alexandrov&oldid=23500666

bzimport added a project: MediaWiki-Parser.Via ConduitNov 21 2014, 8:49 PM
bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz3504.
tstarling created this task.Via LegacySep 19 2005, 3:59 AM
tstarling added a comment.Via ConduitSep 19 2005, 4:04 AM

The output text for <math>hello</math> is:

<p><math
xmlns='http://www.w3.org/1998/Math/MathML'><mi>h</mi><mi>e</mi><mi>l</mi><mi>l</mi><mi>o</mi></math>
</p><p><br />
</p>
<!-- Tidy found serious XHTML errors -->

bzimport added a comment.Via ConduitMar 24 2006, 10:31 AM

j.niesen wrote:

proposed patch, not too nice

An ugly way to resolve this bug is to strip out all the <math> ... </math> tags
before sending the HTML to tidy and to plug them back in afterwards. The
attached patch follows this strategy. I've only testing it with external tidy,
because I couldn't get internal tidy to work.

Attached: patch3504.txt

bzimport added a comment.Via ConduitMar 24 2006, 12:45 PM

j.niesen wrote:

Please disregard the patch. I did not test it properly against the CVS version,
and something changed in the code.

bzimport added a comment.Via ConduitMar 24 2006, 3:08 PM

j.niesen wrote:

I think the patch in comment #2 works, but it depends on my patch from bug 5344.

bzimport added a comment.Via ConduitMay 4 2006, 10:30 PM

sspecter wrote:

I've made a working (but unstable) implementation with MathML on my wiki here.

Here's an example: http://www.sspecter.com/wiki/index.php/Ajuda:ASCIIMath-Sintaxe
Its in portuguese but you can see it working.

I solved that by creating MathML as an extension (probably it is not parsed by
tidy, only sanitizer), and naming the extension tag <asciimath>, to not conflict
with <math> from MathML.

My solution works but is is unstable because sanitizer like to generate
bad-formed tags inside my good-formed MathML, and my XHTML pages crash. It just
happens in some cases, like wiki list (*) + <asciimath>, or <asciimath> with
blank lines inside it. But i believe these problemas will still happen with
comment #2's solution.

bzimport added a comment.Via ConduitMay 5 2006, 1:39 PM

j.niesen wrote:

(In reply to comment #5)

I solved that by creating MathML as an extension (probably it is not parsed by
tidy, only sanitizer), and naming the extension tag <asciimath>, to not conflict
with <math> from MathML.

I think that the sanitizer - specifically, Sanitizer::removeHTMLtags() - does
not touch extension tags and <math> tags, though it is hard to tell anything
from Parser.php. Furthermore, it seems that tidy is not enabled on your site.

My solution works but is is unstable because sanitizer like to generate
bad-formed tags inside my good-formed MathML, and my XHTML pages crash. It just
happens in some cases, like wiki list (*) + <asciimath>, or <asciimath> with
blank lines inside it.

As I said, I doubt it is the sanitizer or tidy that generates the bad-formed
tags. Your problem may be that the MathML which you generate contains newlines.
This confuses the parser. Try replacing all the newline characters with spaces.

But i believe these problemas will still happen with comment #2's solution.

Is this just a guess, or do you have an example in which the patch does not
work? It seems to work fine on http://wiki.blahtex.org/ .

brion added a comment.Via ConduitJun 3 2006, 11:43 PM

I'm going to WONTFIX this.

We're going to be ditching tidy (with its bugginess, overhead, and
annoying features) once the internal HTML normalizer is fixed up,
which it soon will be (in progress, bug 5497).

With normalization working on our own output, we don't have to
worry about tidy choking on extension output.

Add Comment