Page MenuHomePhabricator

Create a Full Coverage Test for TexVC(PHP)
Closed, ResolvedPublic

Description

Generate a full-mathml-coverage test file with LaTeX math input and reference MathML output.

Contents from this Wikipage could be used: https://www.mediawiki.org/wiki/Extension:Math/CoverageTest

Tentative steps:

  • use local MaRDI-Portal instance, with MathSearch extension installed (make sure it has correct settings for Mathoid / LaTeXML before next steps)
  • add the full-coverage wikipage (might need some refactoring)
  • extract the LaTeX to MathML from DB or write an exporter function to a JSON file which can be read by a test
  • write the Testfile (probably for this one a dedicated test-file makes the most sense)

Info from dialogue:
(for example, use this reference for full-coverage, question: How to generate the 'production' reference MathML here? Answer: With production I mean the MathML currently generated. This can be received from restbase or mathoid, or extracted from the HTML-page you linked above. I think Math Coverage tests should be updated a bit, if this is a one-time effort. Better it would be to use a script (which might already exist in the MathSearch extension to generate reference MathML programmatically.)

The test dataset should be designed in a way that 100% test coverage is obtained without adding unnecessary many tests.

Event Timeline

Stegmujo updated the task description. (Show Details)

Some notes:

php extensions/MathSearch/maintenance/UpdateMath.php

Portal has a local instance of LatexML, is it configured ? In LocalSettings.d:

$wgMathLaTeXMLUrl = 'http://latexml:8080/convert/'; this should be ok

Portal points to correct instance of Mathoid for generating stuff ? In LocalSettings.d:

$wgMathFullRestbaseURL = 'https://wikimedia.org/api/rest_';
$wgMathMathMLUrl = 'https://mathoid-beta.wmflabs.org'; # this seems to be up, at least testpage appears, check if in sync with current repo settings

In maintenance/Update.php in MathSearch-extension, rendering mode is defined, update this:
private $renderingMode = 'latexml'; # 'mathml' for mathoid
It doesnt overwrite the renderer object when doing this, this might be an indicator that it is set elsewhere in priority
Can be overwritten in Renderfactory.php ~l 80 : $mode = MathConfig::MODE_MATHML; // tbd remove overwrite here

  1. the restbase setting is wrong though (or cannot be contacted) $wgMathFullRestbaseURL = 'https://wikimedia.org/api/rest_'; This seems also to be the current default setting.

    Maybe this needs some routing or works on the portal production ?!!!

if the mathoid setting for request is practically the same, it is probably faster to write: curl to

https://mathoid-beta.wmflabs.org/info.html

Something like this should do it:

curl https://mathoid-beta.wmflabs.org -d "q=a^2"
curl https://mathoid-beta.wmflabs.org/mml -d "q=a^2"

MySQL database retrieval results for LaTeXML example entry of one formula:

SELECT * FROM mathlatexml;
| ��Y���fb����)U�           |  a^2          | {\displaystyle  a^2 } | <math xmlns="http://www.w3.org/1998/Math/MathML" id="p1.1.m1.1" class="ltx_Math" alttext="{\displaystyle{\displaystyle a^{2}}}" display="inline">
  <semantics id="p1.1.m1.1a">
    <msup id="p1.1.m1.1.3" xref="p1.1.m1.1.3.cmml">
      <mi id="p1.1.m1.1.1" xref="p1.1.m1.1.1.cmml">a</mi>
      <mn id="p1.1.m1.1.2.1" xref="p1.1.m1.1.2.1.cmml">2</mn>
    </msup>
    <annotation-xml encoding="MathML-Content" id="p1.1.m1.1b">
      <apply id="p1.1.m1.1.3.cmml" xref="p1.1.m1.1.3">
        <csymbol cd="ambiguous" id="p1.1.m1.1.3.1.cmml" xref="p1.1.m1.1.3">superscript</csymbol>
        <ci id="p1.1.m1.1.1.cmml" xref="p1.1.m1.1.1">ð

→ TeX and MathML can be obtained for LaTeXML, it seems to contain loads of unusual semantics and annotations for comparison.

The mathoid endpoint (https://mathoid-beta.wmflabs.org';) delivers similar shaped MathML:

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block" alttext="a^{2}">
  <semantics>
    <msup>
      <mi>a</mi>
      <mrow class="MJX-TeXAtom-ORD">
        <mn>2</mn>
      </mrow>
    </msup>
    <annotation encoding="application/x-tex">a^{2}</annotation>
  </semantics>
</math>

Change 881883 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Add Full-Coverage Test TexVC-MMLGeneration for Mathoid-LateXML

https://gerrit.wikimedia.org/r/881883

In addition to the current findings regarding semantics and annotation, LaTeXML and Mathoid seem to render some chars differently to current TexVC(PHP). The differently notated chars are not correctly displayed by browser-rendering (checked with Firefox):

Tex:   \sqrt{\pi}

LaTeXML: .... <msqrt id="p1.1.m1.1.1" xref="p1.1.m1.1.1.cmml">
      <mi id="p1.1.m1.1.1.2" xref="p1.1.m1.1.1.2.cmml">π</mi> ... 

Mathoid: <msqrt data-semantic-type="sqrt" data-semantic-role="unknown" data-semantic-id="1" data-semantic-children="0"><mi data-semantic-type="identifier" data-semantic-role="greekletter" data-semantic-font="italic" data-semantic-annotation="clearspeak:simple;clearspeak:simple" data-semantic-id="0" data-semantic-parent="1">π<!-- π --></mi></msqrt>

TexVC: ...     <msqrt>
      <mi>&#x3C0;</mi> ....

Also, the annotations and semantics themselves seem to create issues in the rendering with browsers. When removing the annotations and semantics from a LaTeXML case, it gets rendered correctly.

<math xmlns="http://www.w3.org/1998/Math/MathML" id="p1.1.m1.1" class="ltx_Math" alttext="{\displaystyle{\displaystyle f(x)=x^{2}\,\!}}" display="inline">
     <mrow>
       <mrow>
         <mi>f</mi>
         <mo> </mo>
         <mrow>
           <mo>(</mo>
           <mi>x</mi>
           <mo>)</mo>
         </mrow>
       </mrow>
       <mo>=</mo>
       <msup>
         <mi>x</mi>
         <mn>2</mn>
       </msup>
     </mrow>
 </math>

for mathoid generation:
$wgMathoidCli = .... /mathoid/cli.js

for mathoid generation:
$wgMathoidCli = -> see MediaWikiPage for that.

Clarified in Meeting:

  • annotations and semantics not the problem HTML Header encoding has to be updated to display in example \sqrt{\pi} correctly.
  • id and xref attributes can be removed before comparison
  • Mathoid should be reachable locally with Update.php -> https://wikimedia.org/api/rest_v1/media/math/check/tex (wgFullRestbaseURL setting or similar until rest_ )
  • mathoid in UpdateMath php --mode mathml or -5

Change 883631 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/MathSearch@master] Add export MathML feature to UpdateMath maintenance script

https://gerrit.wikimedia.org/r/883631

Change 881883 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Add Full-Coverage Test TexVC-MMLGeneration for Mathoid-LateXML

https://gerrit.wikimedia.org/r/881883

Change 883631 merged by jenkins-bot:

[mediawiki/extensions/MathSearch@master] Add export MathML feature to UpdateMath maintenance script

https://gerrit.wikimedia.org/r/883631

Done for now, still the test (as some other tests) will need a comparison algorithm to make in-depth comparison of generated MathML and reference MathML.