Goal: Extend the class \MediaWiki\Extension\Math\WikiTexVC\MMLnodes\MMLbase to support tree structures.
The subclasses of MMLbase should follow the MathML core spec, and only the part of the spec used should be implemented. For example, the attribute dir should not be accepted since it can not be generated by wikitexvc.
In MathML, there are two types of elements: Token Elements, which don't have children, and others that don't have text. Therefore, I suggest adding a new abstract base class, MMLLeaf, to be used for all MathML elements with text content.
MMLbase and its subclasses ensure that instead of dealing with an arbitrary string, one can rely that the structure one works with is a valid MathML tree according to the MathML core spec. This fulfills two functions 1) validation that only standard conform MathML core is generated and 2) that subtree matching is possible due to a normalized representation of the expression.
Suggested implementation path:
Step 0 (optional) rename classes ✅
As we will be working a lot with the MMLm* classes, I would prefer omitting the MML prefix in the class name. Especially since the classes are in a dedicated namespace and mainly start with m, it makes the readability better. However, this is just a question of taste. Moreover, now is a historic moment where almost all code reviews are complete, and this refactoring would not cause merge conflicts.
- Agreement on MML prefix achieved
--> resolution: no we will not rename the classes and keep the MML prefix (for now)
Step 1a: Mark leaves as such ✅
A first commit can be a new class MMLleaf with text as an additional (optional) constructor argument. The Class should extend MMLbase, and the following classes should be changed to be subclasses of that class:
- MMLmi
- MMLmo
- MMLmn
- MMLms (if ever used)
- MMLmtext
In the same commit, absolutely minimal unit tests should be added in tests/phpunit/unit/WikiTexVC/MMLNodes for
- MMLleaf
- MMLmi
- MMLmo
- MMLmn
- MMLms (if ever used)
- MMLmtext
The tests should only test if the constructor works and if they return the correct value for name(). The overall idea would be to phase out the encapsulate start etc. Thus these should not be changed or tested.
Step 1b: Deprecate non-core MathML elements ✅
We should deprecate all classes not in the official list. According to my analysis this is
- menclose (most uses should already be removed per T377167
- double check if more classes are not in core.
Step 1c: Implement an MML converter service skeleton ✅
Ultimately, MMLbase should be stringable. and have an interface to add links based on a list of other MMLbase nodes. Possible method signatures read
public __toString(): string public annotateSubtrees(MMLbase ... $subtrees) (in a later step)
However, as there are various ways to implement it, I think it would be better to transform it into a service so that dependencies can be injected the usual way.
Thus, I suggest building an MMLvisitor service that fulfills these two functions. In the first step, one can follow the manual and the example of the input check service.
The first commit aims to extend ServiceWiring.php, extension.json, and src/Math.php for the new Math.MathMLTreeVisitor service and new MediaWiki\Extension\Math\WikiTexVC\MMLnodes\VisitorFactory and abstract baseVisitor and respective tests.
Step 2 Implement MMLDomVisitor ✅
The second step is to implement the __toString functionality by converting MMLbase elements to DOMElements and attach it to a DOMDocument and use saveXML to get the XML representations.
Here we start with supporting only MMLleave elements and MMLelements with no children.
In a nutshell
$dom = new DOMDocument(); $element = $dom->createElement($leaf->getName()); $element->appendChild($dom->createTextNode(($leaf->getName()))); $dom->appendChild($element); $dom->saveHTML();
Step 3a Implement tree structure ✅
- Familiarize yourself with the constructors of TexNode, TexArray, DQ and FQ
- the function of the protected variable args is similar to the concept of children. However, for MMLbase the children should always be of type MMLbase never strings (they are wrapped into MMLleaf elements)
- The TexNode constructors ensure that only valid trees can be built. For MMLbase this is currently not the case and we had bugs in the past where we had fractions and subscripts with more than two arguments.
- By default, there is no way to add or remove arguments. An exception is TexArray which has push, pop, unshift... This architecture has proven to be successful in the past. The fact that the args variable is protected and not private allows the tree to be modified. However, implementing only specific modifications safeguards against situations where you end up with invalid trees.
After having reviewed this, we might want to discuss the following idea to represent children in the MML tree:
- The MMLbase node
- gets a protected array of type MMLbase that stores the children
- the constructor of MMLbase is extended by a variable length argument of children
- a method to get the children is added
- The non-leaf MML nodes (this can also be a second commit if the change would get too large)
- get additional constructor arguments according to the Core spec
- Tests are added to check the above functionality
Step 3b Implement __toString for leaves ✅
Use the domVisitor implemented in step 1c for the __toString method of the MMLleaf.
Check that results match the respective encapsulate (not encapsulate raw) functions of the leaf nodes.
Compare the performance of __toString and encapsulateraw and document the results, for example in the doc folder.
Step 4.a Extend the DOMVistor to support MMLbase elemet ✅
In Step 2, the DomVisitor only supported MMLLeaf elements. Now, the DomVistor should be extended to handle arbitrary MMLbase elements.
Additional logic is needed to add children, and more testing is needed.
Step 4.b Replace usages for encapsulate with __toString ✅
Review usages of the encapsulate function
- Extension:Math (4 files)
- src/WikiTexVC/MMLmappings/BaseMethods.php (5 matches)
- src/WikiTexVC/MMLmappings/BaseParsing.php (13 matches)
- src/WikiTexVC/MMLnodes/MMLbase.php (1 matches)
- tests/phpunit/unit/WikiTexVC/MMLNodes/BaseTest.php (2 matches)
Replace inappropriate use with other methods (extra commit):
- checkAndParseColor generates an empty mstyle element. I wonder if that method is actively used. --> Replace with encapsulate raw
- macro mrow should become merror
- Replace remaining usages of encapsulate by adding the argument to the constructor and casting to a string
- remove encapsulate method and tests.
Step 4.c Replace usages for encapsulateRaw when called on MMLleaf nodes with __toString for leaf nodes ✅
The following MMLleaf nodes use the encapsulateRaw method
Usages in All Places (58 usages found)
Method call (58 usages found)
mw (58 usages found)
BaseMethods.php (10 usages found)
BaseMethods (10 usages found)
checkAndParseColor (1 usage found)
229 $innerRow .= $mi->encapsulateRaw( $char );
checkAndParseDelimiter (1 usage found)
190 return $mo->encapsulateRaw( $resDelimiter[0] );
checkAndParseMathCharacter (1 usage found)
204 return $mi->encapsulateRaw( $enc );
checkAndParseOperator (1 usage found)
72 return $mmlMo->encapsulateRaw( $input );
parseIdentifier (1 usage found)
166 $text = $mi->encapsulateRaw( $uc );
parseOperator (2 usages found)
123 $text = $mo->encapsulateRaw( $uc . "̸" );
125 $text = $mo->encapsulateRaw( $uc );
parseOperatorDict (3 usages found)
101 return $mmlMo->encapsulateRaw( "<" );
104 return $mmlMo->encapsulateRaw( ">" );
112 return $mmlMo->encapsulateRaw( $input );
BaseParsing.php (35 usages found)
BaseParsing (35 usages found)
accent (1 usage found)
82 $mo->encapsulateRaw( $entity )
array (2 usages found)
99 $output .= $moOpen->encapsulateRaw( $resDelimiter[0] );
116 $output .= $moClose->encapsulateRaw( $resDelimiter[0] );
customLetters (2 usages found)
193 return $mrow->encapsulateRaw( $mo->encapsulateRaw( $char ) );
197 return $mrow->encapsulateRaw( $mi->encapsulateRaw( $char ) );
dots (1 usage found)
224 return $mo->encapsulateRaw( "…" );
genFrac (2 usages found)
294 $output .= $mrowOpen->encapsulateRaw( $moL->encapsulateRaw( $left ) );
301 $output .= $mrowClose->encapsulateRaw( $moR->encapsulateRaw( $right ) );
hBox (5 usages found)
1157 return $mmlMrow->encapsulateRaw( $mo->encapsulateRaw( MMLutil::uc2xNotation( $input ) ) );
1161 return $mmlMrow->encapsulateRaw( $mtext->encapsulateRaw( "\mbox" ) );
1168 return $mmlMrow->encapsulateRaw( $mstyle->encapsulateRaw( $mtext->encapsulateRaw( $inner ) ) );
1173 return $mmlMrow->encapsulateRaw( $mtext->encapsulateRaw( $inner ) );
1190 return $mmlMrow->encapsulateRaw( $mtext->encapsulateRaw( $inner ) );
hskip (1 usage found)
358 return $mspace->encapsulateRaw( "" );
macro (10 usages found)
394 return $mtext->encapsulateRaw( ' ' );
429 $mmlMi->encapsulateRaw( "lim" ) . $mo->encapsulateRaw( "―" ) ) );
437 $mi->encapsulateRaw( "lim" ) .
438 $mo->encapsulateRaw( "→" ) )
485 return $mstyle->encapsulateRaw( $mspace->getEmpty() ) . $mo->encapsulateRaw( "⟹" ) .
491 return $mstyle->encapsulateRaw( $mspace->getEmpty() ) . $mo->encapsulateRaw( "⟺" ) .
496 return $mo->encapsulateRaw( "—" );
516 return $mtext->encapsulateRaw( " " ) .
520 $mo->encapsulateRaw( "⟵" ) ) ) .
523 $mo->encapsulateRaw( "⟶" )
makeBig (1 usage found)
917 return $mrowOuter->encapsulateRaw( $mrow->encapsulateRaw( $mo->encapsulateRaw( $argPrep ) ) );
matrix (2 usages found)
592 $mmlMoOpen = $mmlMoOpen->encapsulateRaw( $open ?? '' );
600 $mmlMoClose = $mmlMoClose->encapsulateRaw( $close );
namedFn (1 usage found)
933 return $mi->encapsulateRaw( ltrim( $name, '\\' ) ) . $applyFct;
namedOp (1 usage found)
615 return $mi->encapsulateRaw( $id ?? ltrim( $name, '\\' ) ) . $applyFct;
oint (3 usages found)
672 return $mStyle->encapsulateRaw( $mo->encapsulateRaw( MMLutil::uc2xNotation( $uc ) ) );
675 return $mo->encapsulateRaw( MMLutil::uc2xNotation( $uc ) );
683 $mmlText->encapsulateRaw( MMLutil::uc2xNotation( $uc ) )
underOver (1 usage found)
799 $mo->encapsulateRaw( $inner )
xArrow (2 usages found)
1284 $mstyle->encapsulateRaw( $moArrow->encapsulateRaw( $char ) ) .
1298 $mstyle->encapsulateRaw( $moArrow->encapsulateRaw( $char ) ) .
Box.php (1 usage found)
Box (1 usage found)
renderMML (1 usage found)
64 $mtext->encapsulateRaw( $arg )
ChemWord.php (2 usages found)
ChemWord (2 usages found)
renderMML (2 usages found)
51 $mtextLeft->encapsulateRaw( $this->getLeft()->renderMML( [], $state ) )
52 . $mtextRight->encapsulateRaw( $right ) ) );
Fun1.php (1 usage found)
Fun1 (1 usage found)
createMover (1 usage found)
65 $mo->encapsulateRaw( $inner )
Literal.php (4 usages found)
Literal (4 usages found)
createVlineElement (1 usage found)
209 $mStyle->encapsulateRaw( $mo->encapsulateRaw( "|" ) ) ) );
renderMML (3 usages found)
75 return $mn->encapsulateRaw( $this->changeUnicodeFontInput( $this->arg, $state ) );
91 return $mi->encapsulateRaw( $operatorContent["foundOC"] );
143 return $mi->encapsulateRaw( $this->changeUnicodeFontInput( $input, $state ) ); // $this->arg
Lr.php (2 usages found)
Lr (2 usages found)
renderMML (2 usages found)
72 $left = $moLeft->encapsulateRaw( $this->right );
78 $right = $moRight->encapsulateRaw( $this->right );
MMLParsingUtil.php (2 usages found)
MMLParsingUtil (2 usages found)
createNot (1 usage found)
109 return $mmlMrow->encapsulateRaw( $mpadded->encapsulateRaw( $mtext->encapsulateRaw( "⧸" ) ) );
renderApplyFunction (1 usage found)
21 return $mo->encapsulateRaw( "⁡" );
TexArray.php (1 usage found)
TexArray (1 usage found)
addDerivativesContext (1 usage found)
381 $mml = $msup->encapsulateRaw( $mml . $moDeriv->encapsulateRaw( $derInfo ) );replace patterns like
$mmlMo = new MMLmo(); return $mmlMo->encapsulateRaw( "<" )
by return (string) (new MMLMo( '',',','<');
If > is a variable $x use html-entity-decode to convert to normal text. So the above example would become
return (string) (new MMLMo( '',',',html_entity_decode($x));
Step 4d: Replace usages of MMLbase::getEmpty. ✅
We have to replace MMLbase::getEmpty. This is only called for MMLmspace or MMLmrow and returns the empty Element <mspace\>. With our current implementation, we get the full tag just without an inner text: <mspace><\mspace>. <del>Either we change the tests,</del> or we change MMLDomVisitor::getHTML from $this->dom->saveHTML( $this->dom->documentElement ) to $this->dom->saveXML( $this->dom->documentElement, LIBXML_NOEMPTYTAG ). That should automatically close all empty elements while keeping everything else the same.
Step 5 Replace remaining usages of encapsulateRaw ✅
Replace the remaining usages of encapsulateRaw class by class and make one commit for each class (including test classes).
Try to do the string conversion as late as possible in each function.
For private methods try to change the return type from string to MMLbase.
We might want to create a new function in TexNode that will eventually replace renderMML, which we might call toMMLTree, and has return type MMLbase instead of string.
Step 5.a Make MMLDomVisitor temporarily accept strings as children
Currently, we have TexNode, which calls BaseMethods::checkAndParse, which then calls BaseParsing::..., which then also calls TexNode. Changing one to only use MMLBase breaks the other functions.
We can parse XML strings directly to DOM with appendXML, so that we don't have to rewrite everything at once.
Step 5.b Make BaseParsing functions return MMLbase ✅
Now that the DOMVisitor also accepts strings, we can rewrite all functions of BaseParsing.php to only return MMLbase. All calls of renderMML() can be directly parsed as children of MMLbase.
Note: That MMLbase does not represent the structure of the DOM as some parts of the tree are still packed into strings.
Step 5.c Make BaseMethods functions return MMLbase ✅
Replace remaining usages of encapsulateRaw and the string casting with MMLbase.
Step 5.d Make TexNode support MMLbase
We might want to create a new function in TexNode that will eventually replace TexNode::renderMML, which we might call `toMMLTree, and has return type MMLbase instead of string.
Step 5.e Replace remaining usages of encapsulateRaw
The class`MMLParsingUtil.php` should be the last class with encapsulateRaw
Step 5.f Remove encapsulateRaw
Remove encapsulateRaw, getStart, getEnd, getEmpty and the string support of MMLDomVisitor
Step x Implement subtree matching
- to be continued



