Page MenuHomePhabricator

Math: Improve MML tree implementation
Open, Needs TriagePublic

Description

Currently, the MathML tree allows child elements to be null, string, MMLbase. This leads to frequent type checks and inconvenient branching in the code. This task aims to simplify the type model and make the tree handling cleaner and more maintainable.

We should decide on what to do with each problem

  • Strings

Null

Instead of returning null we use an new MMLarray() object instead. A strict null check of $x should be converted to $x intanceof MMLarray && $x->isEmpty(). In situations where empty elements should be treated in the same way as null elements $x->isEmpty()$ is sufficient

Event Timeline

We should first begin by removing the strings. Removing strings from the MML tree is not hard, but it should be split into multiple commits due to the many files we have to change. Temporarily, I would suggest return null instead of an empty string in the functions that need it.

As for the null values, we could exchange them with a virtual node like MMLEmpty extends MMLbase (isVirtual=true, isEmpty=true, renders '').

And as for MMLarray, I currently have not really thought in-depth about it. Just exchanging with MMLmrow does not work. We could maybe add some virtual flags that may help. I think the least we could do is add a fallback in MMLDomVisitor that would render the MMLarray as an mrow instead of throwing an error.

Thank you for bringing this up. I think we might want to discuss two questions.

How can partial trees, strings and null values converted to valid MathML trees?

Which (pseudo) classes extending mmlbase do we need to implement this and how do they differentiate from actual mml classes with corresponding mathml elements.

Change #1248783 had a related patch set uploaded (by Physikerwelt; author: Physikerwelt):

[mediawiki/extensions/Math@master] Add return types for BaseParsing

https://gerrit.wikimedia.org/r/1248783

Change #1248783 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Add return types for BaseParsing

https://gerrit.wikimedia.org/r/1248783

Thank you for bringing this up. I think we might want to discuss two questions.

How can partial trees, strings and null values converted to valid MathML trees?

Which (pseudo) classes extending mmlbase do we need to implement this and how do they differentiate from actual mml classes with corresponding mathml elements.

For strings, we don't even have that many occurrences. Most of the time, we return empty strings, which could just be exchanged with either an empty mrow or an empty MMLarray. The other occurrences should also be easy to deal with. As for null values, just returning an empty MMLarray and then checking for MMLarray->isEmpty should make this null-safe. And for partial trees, we could add to MMLDomVisitor to put partial trees from MMLarray into an mrow if the root is an MMLarray.

So I think we do not need additional elements.

So I think we do not need additional elements.

I agree. I think we can also merge mrow and marray if we add some properties. isEmpty already exist so we might want to only add something like isVirtual

Do you have an idea how we can find the functions that return either strings or mmlbase elements. I think in some situations we still want to use strings, but not the mixed return type.

Do you have an idea how we can find the functions that return either strings or mmlbase elements. I think in some situations we still want to use strings, but not the mixed return type.

I dont know if there is any good way to find it, but I just set the constructor and addChild function to only accept MMLbase and null value,s and then worked myself through the error: public function __construct( string $name, string $texclass = '', array $attributes = [], MMLbase|null...$children ).

I agree. I think we can also merge mrow and marray if we add some properties. isEmpty already exist so we might want to only add something like isVirtual

I think I still prefer the current split: MMLmrow as a real MathML container, and MMLarray as a virtual one. MMLmrow can be empty, but it is still meaningful in some formulas as a real placeholder, or it carries actual structural meaning. MMLarray, on the other hand, is consistently just a virtual container or a lightweight “no result” placeholder.

Change #1249075 had a related patch set uploaded (by FrederikHennecke1; author: FrederikHennecke1):

[mediawiki/extensions/Math@master] Return only MMLbase in FQ

https://gerrit.wikimedia.org/r/1249075

Change #1249075 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Return only MMLbase in FQ

https://gerrit.wikimedia.org/r/1249075

@FrederikHennecke1 ok. Let's focus on strings and null first. I like the idea with the empty MMLarray instead of null.

A follow-up task can to reduce the size of page further to save bandwidth and energy. In the following we can

  1. merge mrow, a, mstyle elements into a (which is shorter than mrow but does the same)
  2. combine and reduce the number of a elements

For the second follow-up step I think it would be better if we would replace mmlarray elements with mrow elements with a special parameter, as we don't know ahead of time if an mrow element will end up in the dom or be merged into something else eventually. That way the future mrow/a element will behave more like a marray element than a mrow element.
T417592 is an example were we removed too much mrows already now. While I think in many other situations there are still to many mrows. The best thing would be to omit superfluous {} already when checking latex code.

This sounds good to me. But we should wait for T415005 to be done.

Change #1249426 had a related patch set uploaded (by FrederikHennecke1; author: FrederikHennecke1):

[mediawiki/extensions/Math@master] Return only MMLbase in BaseMthods chechAndParseDelimiter

https://gerrit.wikimedia.org/r/1249426

Change #1249426 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Return only MMLbase in BaseMethods checkAndParseDelimiter

https://gerrit.wikimedia.org/r/1249426

Change #1250704 had a related patch set uploaded (by FrederikHennecke1; author: FrederikHennecke1):

[mediawiki/extensions/Math@master] Return only MMLbase in BaseMethods checkAndParseOperator

https://gerrit.wikimedia.org/r/1250704

Change #1250704 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Return only MMLbase in BaseMethods checkAndParseOperator

https://gerrit.wikimedia.org/r/1250704

Change #1251058 had a related patch set uploaded (by FrederikHennecke1; author: FrederikHennecke1):

[mediawiki/extensions/Math@master] Return only MMLbase in BaseMethods checkAndParseIdentifier

https://gerrit.wikimedia.org/r/1251058

Change #1251058 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Return only MMLbase in BaseMethods checkAndParseIdentifier

https://gerrit.wikimedia.org/r/1251058

Change #1251102 had a related patch set uploaded (by FrederikHennecke1; author: FrederikHennecke1):

[mediawiki/extensions/Math@master] Return only MMLbase in BaseMethods checkAndParseMathCharacter

https://gerrit.wikimedia.org/r/1251102

Change #1251102 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Return only MMLbase in BaseMethods checkAndParseMathCharacter

https://gerrit.wikimedia.org/r/1251102

Change #1256608 had a related patch set uploaded (by FrederikHennecke1; author: FrederikHennecke1):

[mediawiki/extensions/Math@master] Return only MMLbase in BaseParsing

https://gerrit.wikimedia.org/r/1256608

Change #1256608 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Return only MMLbase in BaseParsing

https://gerrit.wikimedia.org/r/1256608

Change #1261491 had a related patch set uploaded (by FrederikHennecke1; author: FrederikHennecke1):

[mediawiki/extensions/Math@master] Return only MMLbase in BaseParsing and BaseMethods

https://gerrit.wikimedia.org/r/1261491

Change #1261491 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Return only MMLbase in BaseParsing and BaseMethods

https://gerrit.wikimedia.org/r/1261491

@Physikerwelt I am currently updating MMLParsingUtil.php and I noticed the following: forgeIntentToSpecificElement currently does nothing. It passes $elementTag as the first attribute, which is read by simplexml_load_string. Of course this does nothing and we get an empty string as the return value. The question is: should we fix it? Or just discard that function? I don't know of any current formula that uses this.
https://github.com/wikimedia/mediawiki-extensions-Math/blob/91d0a178fb9ecc4b5037abc98fbd7973beeac17a/src/WikiTexVC/MMLmappings/Util/MMLParsingUtil.php#L223

I am still planning to enable intent in the future. Maybe we can keep a record of the intent-related features we remove in a dedicated ticket?