Page MenuHomePhabricator

generate MathML (not PNG) and automatically embed hyperlinks for each symbol
Open, Needs TriagePublic

Description

Author: richardbrucebaxter

Description:
I propose that mediawiki is upgraded to generate MathML and automatically embed hyperlinks for each mathematical symbol/operator. Note it was suggested I open this bugzilla report by a developer on the mediawiki IRC dev channel after showing them my initial request on the meta wiki (http://meta.wikimedia.org/w/index.php?title=Help_talk:Displaying_a_formula#Mediawiki_math_markup_interpretation_upgrade:_generate_MathML_.28not_PNG.29_and_automatically_embed_hyperlinks_for_each_symbol).

The following is an example of proposed wiki math code
<math>A[[simple equation#area]] = \pi r[[simple equation#radius]]^2 + q[[some constant]] \cos(z)</math>

  1. Latex math [[wiki link]] tags are removed by the Math extension preprocessor
  2. MathML is generated by latexml
  3. user defined hyperlinks are readded to the generated MathML by the Math extension postprocessor
  4. hyperlinks are automatically added to all remaining mathematical operators/symbols by the Math extension postprocessor (which reside in its database of existing math symbols/operators; e.g. plain text file)

Relevant wiki hyperlinks are automatically generated for all standard mathematical symbols and operators eg "+", "=", "squared", "cos". If the user for example clicks on the hyperlink to z (which has not been explicitly defined by the user and does not reside in the Math extension database of mathematical symbols), the wiki (e.g. Wikipedia) returns "Variable is undefined, would you like to define it by editing this page?"

Note Firefox's MathML implementation (I am unsure about MathJax) requires either all math objects to be explicitly hyperlinked or none. The reason for this is that when hyperlinks are auto generated for math "sections" (eg square root, division), all the child objects in the section are by default linked to the section hyperlink (e.g. "b^2-4ac" in "\sqrt{b^2-4ac\ }"). This will be confusing for the user, so it is better that all math objects be explicitly hyperlinked, even if they must be directed to a new/edit page.

Although I am not a web programmer, I am happy to implement this myself in php if necessary. I must however report an issue in the existing mediawiki Math extension software that affects the latexml MathML option (it appears to be related to https://gerrit.wikimedia.org/r/#/c/135521). The only way I have been able to get the latexml MathML option working in the current version of the mediawiki Math extension (e.g. 11 August 2014) or the last stable version of the mediawiki Math extension (1.23.2) is to;

  1. first delete the mysql wiki database (drop database <db_name>;)
  2. then install mediawiki 1.22.9 (legacy) along with its corresponding version of the mediawiki Math extension (1.22.9).

a. ensure to tick "enable image uploads" (to prevent a bug that stops the default PNG generated formulae from being displayed)
b. create Math extension temporary folders
cd /var/www/html/mediawiki-xxx/images
mkdir math
mkdir tmp
sudo chown -R www-data:www-data *
c. run maintenance/update.php (to prevent a bug "A database query error has occurred. This may indicate a bug in the software")
d. install LaTeXML (http://www.formulasearchengine.com/node/3)
e. test that MathML is working; set $wgUseLaTeXML = true; $wgUseMathJax = true; $wgDefaultUserOptions['math'] = MW_MATH_LATEXML; in LocalSettings.php

  1. then install the current version of mediawiki (mediawiki-latest.tar.gz/11 August 2014 or mediawiki 1.23.2) along with its corresponding version of the mediawiki Math extension (Math.zip/11 August 2014 or 1.23.2).

I have attached a complete installation log for reference (mediaWikiMathExtensionMathMLinstallationLog-11August2014.txt). It would however be useful if someone could publish a formal workaround for this issue (for at least 1.23.2 stable); for example a .sql file containing the required mysql table updates.

I have also attached the mathML code of what I expect a final equation to look like on Wikipedia after the latex is preprocessed, rendered, and postprocessed (mathMLtestQuadraticEquation.html).

Thanks for your help.

Richard


mathMLtestQuadraticEquation.html

<!-- 1. original latex code -->
<!-- <math>x=\frac{-b\pm\sqrt{b^2-4ac\ }}{2a}.</math> -->

<!-- 2. proposed wikpedia latex code -->
<!-- <math>x=\frac{-b[[quadratic equation#linear coefficient]]\pm\sqrt{b^2-4a[[quadratic equation#quadratic coefficient]]c[[quadratic equation#constant]]\ }}{2a}.</math> -->

<!-- 3. proposed final wikpedia MathML output -->
<math>
<mrow>
<mi href="en.wikipedia.org/wiki/quadratic_equation#quadratic_root">x</mi>
<mo href="en.wikipedia.org/wiki/Equals_sign">=</mo>
<mfrac href="en.wikipedia.org/wiki/fraction">
<mrow>
<mo>&#x2212;</mo>
<mi href="https://en.wikipedia.org/wiki/quadratic_equation#linear_coefficient">b</mi>
<mo href="https://en.wikipedia.org/wiki/Plus-minus_sign">&#xB1;</mo>
<msqrt href="https://en.wikipedia.org/wiki/Square_root">
<mrow>
<msup>
<mi href="https://en.wikipedia.org/wiki/quadratic_equation#linear_coefficient">b</mi>
<mn href="https://en.wikipedia.org/wiki/Square_number">2</mn>
</msup>
<mo href="https://en.wikipedia.org/wiki/Subtraction">&#x2212;</mo>
<mn href="https://en.wikipedia.org/wiki/number">4</mn>
<mi href="https://en.wikipedia.org/wiki/quadratic_equation#quadratic_coefficient">a</mi>
<mi href="https://en.wikipedia.org/wiki/quadratic_equation#constant">c</mi>
</mrow>
</msqrt>
</mrow>
<mrow>
<mn href="https://en.wikipedia.org/wiki/number">2</mn>
<mi href="https://en.wikipedia.org/wiki/quadratic_equation#quadratic_coefficient">a</mi>
</mrow>
</mfrac>
</mrow>
</math>


Version: master
Severity: enhancement

Details

Reference
bz69424

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:40 AM
bzimport added a project: Math.
bzimport set Reference to bz69424.
bzimport created this task.Aug 12 2014, 4:31 AM

richardbrucebaxter wrote:

mediaWikiMathExtension MathMLinstallationLog-11August2014.txt

attachment mediaWikiMathExtensionMathMLinstallationLog-11August2014.txt ignored as obsolete

richardbrucebaxter wrote:

mathMLtestQuadraticEquation.html

Attached:

richardbrucebaxter wrote:

mediaWikiMathExtensionMathMLinstallationLog-12August2014.txt

CORRECTION: I can't confirm that the latexml MathML option of the current mediawiki version 1.23.2/1.24alpha (with its corresponding Math extension) can be made to work by first preinstalling mediawiki 1.22.9 (and its corresponding Math extension). It appears that it appeared to work due to some kind of mysql table cache of previously generated formulae. Mediawiki 1.22.9 and its corresponding Math extension is the only version I have been able to get working with the latexml MathML option.

attachment mediaWikiMathExtensionMathMLinstallationLog-12August2014.txt ignored as obsolete

physik wrote:

Hi Richard,

I'm extremely interested in your project. Hyperlinks in the formulae are exactly what I want to have a well defined semantics for mathematical formulae. I strongly encourage you to keep working on this approach and I'll support you if you need help with that.
I feel sorry that you had difficulties with the mathlatexml table. It has been renamed from math_latexml to mathlatexml, which requires another run of the database update.
The important step during the installation is
"After enabling the LaTeXML rendering mode you have to run the database update script again to create the required table."
Did you do that?
http://www.mediawiki.org/wiki/Manual:Update.php

The mathoid table which you were referring to is not used in the LaTeXML rendering mode.
If you login to mysql you should now see a mathlatexml table after executing "Show tables;"

Best
Physikerwelt

physik wrote:

.. I updated the documentation about that table
https://www.mediawiki.org/wiki/Extension:Math/mathlatexml_table
It's save to run
drop table mathlatexml;
and run the database update script again.
The table is used as cache only to increase the performance of the math extension.
In addition I'd like to give you a pointer to a place where the default meanings of the macros can be added.
In latexml the file [1]

takes care of mediawiki specific commands. Not all of them [2] are listed there.
For example there is a command called $\Reals$ (unfortunately only 8 times used in enwiki) but imho a prominent candadiate for a link to https://en.wikipedia.org/wiki/Real_number or even better to https://www.wikidata.org/wiki/Q12916 to be language independent.

[1] https://github.com/brucemiller/LaTeXML/blob/master/lib/LaTeXML/Package/texvc.sty.ltxml
[2] http://www.formulasearchengine.com/sites/formulasearchengine.com/files/android.txt

richardbrucebaxter wrote:

mathMLelementsWikipediaLinks.txt

attachment mathMLelementsWikipediaLinks.txt ignored as obsolete

richardbrucebaxter wrote:

mathMLsymbolsWikipediaLinks.txt

attachment mathMLsymbolsWikipediaLinks.txt ignored as obsolete

richardbrucebaxter wrote:

Thanks for your help Physikerwelt,

Although Math extension 11 August 2014 was giving a blank screen, version 13 August 2014 was giving a "Failed to parse (<math_empty_tex>):" error. I got it working by compiling texvccheck (or setting $wgMathDisableTexFilter = true;). These are basic installation instructions that I had overlooked (provided by https://www.mediawiki.org/wiki/Extension:Math).

Both MW_MATH_MATHML and MW_MATH_LATEXML are now working (with both mediawiki stable 1.23.2 and mediawiki latest 13 August 2014, using a current Math extension build).

$wgMathValidModes = array( MW_MATH_PNG, MW_MATH_SOURCE, MW_MATH_LATEXML, MW_MATH_MATHML);
$wgDefaultUserOptions['math'] = MW_MATH_LATEXML;

As you have mentioned, setting MW_MATH_LATEXML requires reexecution of "php update.php" to prevent a "A database query error has occurred" error (although setting MW_MATH_MATHML does not).

Richard

richardbrucebaxter wrote:

mediaWikiMathExtensionMathMLinstallationLog-13August2014.txt

attachment mediaWikiMathExtensionMathMLinstallationLog-13August2014.txt ignored as obsolete

richardbrucebaxter wrote:

Thanks also for your interest in this project (and the links).

Note I had an additional idea for the postprocessor to help reduce the amount of manual wikipedia editing consequent of the proposed enhancement;

  1. detect all possible variable names within the generated mathml tags
  2. siphon variable descriptions from wiki text immediately following <math> text based on the variable names detected
  3. create mathml tooltips for all of these variables (displaying their extracted descriptions)

Richard

physik wrote:

Hi Richard,

Robert Pagel and me started with that already. (http://arxiv.org/abs/1407.0167) It's all open source.

Physikerwelt

richardbrucebaxter wrote:

CORRECTION: "Although Math extension REL1.23.2 was giving a blank screen.."

richardbrucebaxter wrote:

Cheers Physikerwelt - that is exactly what I was thinking of.

richardbrucebaxter wrote:

mediaWikiMathExtensionMathMLinstallationLog-14August2014.txt

Attached:

richardbrucebaxter wrote:

mathMLsymbolsWikipediaLinks.txt

attachment mathMLsymbolsWikipediaLinks.txt ignored as obsolete

richardbrucebaxter wrote:

mathMLsymbolsWikipediaLinks.txt

attachment mathMLsymbolsWikipediaLinks.txt ignored as obsolete

richardbrucebaxter wrote:

mathMLelementsWikipediaLinks.txt

attachment mathMLelementsWikipediaLinks.txt ignored as obsolete

richardbrucebaxter wrote:

latexSymbolsGreek.txt

Attached:

richardbrucebaxter wrote:

AEHprelim-MathMathML.patch

This patches MathMathML.php (13 August 2014) for use with MathMathMLautomaticallyEmbedHyperlinks.php (preliminary version).

attachment AEHprelim-MathMathML.patch ignored as obsolete

richardbrucebaxter wrote:

MathMathMLautomaticallyEmbedHyperlinks.php

MathMathMLautomaticallyEmbedHyperlinks.php (preliminary version). Here is an example of mediawiki input/output;

Latex;
<math>\gamma[[description]]=\frac{-b\pm\sqrt{b^2-4ac\ }}{2a}. + 355 + \alpha[[Fine structure]] + \beta[[hello]] + c + ab[[b variable]]</math>

MathML;
<math xmlns="http://www.w3.org/1998/Math/MathML" href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">
<mrow class="MJX-TeXAtom-ORD" href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">
<mstyle href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">
<mi href="https://en.wikipedia.org/wiki/description">&#x3b3;<!-- \u03b3 --></mi>
<mo href="https://en.wikipedia.org/wiki/Equals_sign">=</mo>
<mrow class="MJX-TeXAtom-ORD" href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">
<mfrac href="https://en.wikipedia.org/wiki/Fraction">
<mrow href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">
<mo href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">&#x2212;<!-- \u2212 --></mo>
<mi href="https://en.wikipedia.org/wiki/b_variable">b</mi>
<mo href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">&#xb1;<!-- ± --></mo>
<mrow class="MJX-TeXAtom-ORD" href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">
<msqrt href="https://en.wikipedia.org/wiki/Square_root">
<msup href="https://en.wikipedia.org/wiki/Subscript_and_superscript#supercript">
<mi href="https://en.wikipedia.org/wiki/b_variable">b</mi>
<mrow class="MJX-TeXAtom-ORD" href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">
<mn href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">2</mn>
</mrow>
</msup>
<mo href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">&#x2212;<!-- \u2212 --></mo>
<mn href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">4</mn>
<mi href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">a</mi>
<mi href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">c</mi>
<mtext href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">&#xa0;</mtext>
</msqrt>
</mrow>
</mrow>
<mrow href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">
<mn href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">2</mn>
<mi href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">a</mi>
</mrow>
</mfrac>
</mrow>
<mo href="https://en.wikipedia.org/wiki/Scalar_product">.</mo>
<mo href="https://en.wikipedia.org/wiki/Addition">+</mo>
<mn href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">355</mn>
<mo href="https://en.wikipedia.org/wiki/Addition">+</mo>
<mi href="https://en.wikipedia.org/wiki/Fine_structure">&#x3b1;<!-- \u03b1 --></mi>
<mo href="https://en.wikipedia.org/wiki/Addition">+</mo>
<mi href="https://en.wikipedia.org/wiki/hello">&#x3b2;<!-- \u03b2 --></mi>
<mo href="https://en.wikipedia.org/wiki/Addition">+</mo>
<mi href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">c</mi>
<mo href="https://en.wikipedia.org/wiki/Addition">+</mo>
<mi href="https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">a</mi>
<mi href="https://en.wikipedia.org/wiki/b_variable">b</mi>
</mstyle>
</mrow>
</math>

attachment MathMathMLautomaticallyEmbedHyperlinks.php ignored as obsolete

richardbrucebaxter wrote:

mathMLtestQuadraticEquation-AEHprelim.html

attachment mathMLtestQuadraticEquation-AEHprelim.html ignored as obsolete

richardbrucebaxter wrote:

AEHprelim-MathMathML.patch (prelimV2)

Attached:

richardbrucebaxter wrote:

AEHprelim-MathLaTeXML.patch (prelimV2)

Attached:

richardbrucebaxter wrote:

MathMathMLautomaticallyEmbedHyperlinks.php (prelimV2)

change log;

  • resolve hex symbols in mathMLsymbolsWikipediaLinks.txt not being interpreted
  • support for MW_MATH_LATEXML
    • change "private $mathmlAEHobject;" to "protected $mathmlAEHobject;"
    • add mathmlAEHpostprocessContent() reference to MathLaTeXML.php

attachment MathMathMLautomaticallyEmbedHyperlinks.php ignored as obsolete

richardbrucebaxter wrote:

mathMLsymbolsWikipediaLinks.txt

Attached:

richardbrucebaxter wrote:

mathMLelementsWikipediaLinks.txt

attachment mathMLelementsWikipediaLinks.txt ignored as obsolete

richardbrucebaxter wrote:

MathMathMLautomaticallyEmbedHyperlinks.php (prelimV2b)

change log;

  • remove <mn> from mathmlElementsSupportingWikiLinkReferenceVariables to enable generic identification of numbers

Attached:

richardbrucebaxter wrote:

mathMLtestQuadraticEquation-AEHprelim.html (prelimV2b)

Attached:

richardbrucebaxter wrote:

mathMLelementsWikipediaLinks.txt

Attached:

Pkra added a comment.Aug 20 2014, 2:10 PM

Just my 2ct.

  • Have you considered \href{url}{math} instead of wiki links? That would be more TeX-like and more compatible with the Math Extension backend.
  • Have you brought this up with the community and UX specialists? An author might be unhappy if the content is changed so drastically. An equation made entirely of links (like the example posted) looks to me like a big usability challenge.
  • Maybe maction tooltips are an alternative, together with some discoverability UI.

Peter.

physik wrote:

Hi Richard,

feel free to commit changes to https://www.mediawiki.org/wiki/Extension:MathSearch
.
By that way other people can use it in the future. Furthermore, I can give you comments and suggestions on your code via gerrit.
In order to get this working independent of your server or installation all you need to do is to implement the

https://www.mediawiki.org/wiki/Manual:Hooks/MathFormulaRendered

hook.

Examples are available in the MathSearch extension.

Best
Physikerwelt

He7d3r added a comment.Sep 3 2014, 4:36 PM

(In reply to Peter Krautzberger from comment #30)

  • Have you considered \href{url}{math} instead of wiki links? That would be

more TeX-like and more compatible with the Math Extension backend.

+1 for something like this:
https://en.wikibooks.org/w/index.php?title=LaTeX/Hyperlinks#.5Chyperref

Change 471772 had a related patch set uploaded (by AndreG-P; owner: AndreG-P):
[mediawiki/extensions/Math@master] Link wikidata items to math tags

https://gerrit.wikimedia.org/r/471772

The custom "Wikipedia syntax" suggested here would be a terrible idea. We are having enough trouble with non-standard LaTeX syntax already. At times, especially with mhchem or optional arguments (which use square brackets) it seems random, how an equation renders and if not what kind of error message you get. Even as an experienced mediawiki-user you can spend hours trying to find out where those strange error messages come from and how to fix them.

I also did a quick search trough my dumps of all math on WMF projects and found 239 equations containing the pattern [[
e.g. https://en.wikipedia.org/wiki/Transseries
which would get broken by the suggested syntax.

The proper way of doing this would be to define a LaTeX macro or use \href{url}{math} as suggested by @Pkra. Supporting wikilinks with \href is actually on the to-do list of our commission T195861.

Change 471772 merged by jenkins-bot:
[mediawiki/extensions/Math@master] Link wikidata items to math tags

https://gerrit.wikimedia.org/r/471772

Pkra added a subscriber: Andreg-p.Dec 10 2018, 8:16 PM

It's unclear to me what happened in this thread. IIUC @Andreg-p 's patch was merged and released which would indicate that this can be closed as resolved.

I don't understand if the patch is actually working (e.g. I don't see the data attribute on live content). I also don't understand how the patch relates to the original posting to this thread (and if it doesn't, why it references this issue).