Page MenuHomePhabricator

Provide a macro for putting arbitrary Unicode content in <mo> tag
Open, Needs TriagePublicFeature

Description

Feature summary (what you would like to be able to do and where):

As of today, there are several LaTeX macros which allows to put (almost) arbitrary Unicode content in a <mtext> tag in the produced MathML : \text, \mbox, and more generally every macro with the box_functions tag. I would like to have a macro which do the same, but which put the content in a <mo> tag.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

A lot of symbols cannot be typed in math blocks today, which is quite limiting, especially since actual LaTeX implementations allow arbitrary unicode in formulas (with unicode-math). For example, I couldn't type ⅋ and I had to do use spacing tricks to type ⟦ and ⟧ (see https://phabricator.wikimedia.org/T391290). With a macro named for example \mobox, one could simply type \mobox{⅋} or \mobox{⟦}.

Benefits (why should this be implemented?):
Of course, one could write these characters inside a \text, but it doesn't respect the semantics of MathML. Moreover, for delimiters for example, one would like that they scale if they are combined with \big or \left, so it is not fully satisfying.

I don't really know if this extension is commited to use only a valid subset of LaTeX, because then adding a macro *which doesn't exists in any LaTeX package* would break this commitment. Maybe the source rendering mode should just delete the use of this macro if we want it to output valide LaTeX (e.g. it would transform \mobox{a} in a). Also, I'm not sure about which options pass to the mo tag. Some operators use stretchy="false" by default, other do not (and MathML itself has its own default), even parentheses/brackets/braces are not consistent about fence="false"), and operators next to \left or \right get fence="true", stretchy="true", symmetric="true". I think a good default would be stretchy="false", fence="false", but maybe there can be several macros, or a macro with optional arguments, to handle all the cases.

I have working a prototype for this, which works with both client MathJax rendering and MathML rendering, but I don't know enough the extension to know if it broke something I didn't see.

Related Objects

StatusSubtypeAssignedTask
OpenFeatureJeanCASPAR
OpenPhysikerwelt
StalledNone
Resolvedtaavi
ResolvedPhysikerwelt
ResolvedPhysikerwelt
DuplicateBUG REPORTNone
OpenBUG REPORTNone
ResolvedPhysikerwelt
OpenBUG REPORTNone
OpenPhysikerwelt
OpenBUG REPORTNone
OpenPhysikerwelt
OpenBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
DuplicateBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
DuplicateBUG REPORTNone
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedPhysikerwelt
ResolvedPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTJeanCASPAR
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
DuplicateBUG REPORTNone
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedPhysikerwelt
ResolvedPhysikerwelt
ResolvedPhysikerwelt
ResolvedPhysikerwelt
ResolvedPhysikerwelt
OpenPhysikerwelt
OpenPhysikerwelt
In ProgressBUG REPORTPhysikerwelt
OpenBUG REPORTNone
OpenKrinkle
ResolvedPhysikerwelt

Event Timeline

Change #1163862 had a related patch set uploaded (by JeanCASPAR; author: JeanCASPAR):

[mediawiki/extensions/Math@master] Add \mobox primitive in order to put arbitrary symbols in <mo> MathML tag

https://gerrit.wikimedia.org/r/1163862

This is generally a good idea. However, I think this should be coordinated with the Latex project and we should not invent a macro that does not render in normal tex. Note that the latest Latex version also supports MathML output (see https://www.latex-project.org/news/2025/06/02/issue41-of-latex2e-released/) so I wonder if we can find a macro name that is compatible with the regular LaTeX rendering.
If we would be able to extend the list of supported input, we might also allow them in normal math mode. So instead of writing \mobox{⟦} we could simply write ⟦ (at least for the operators listed in the operator dictionary, because they have defined spacing https://www.w3.org/TR/mathml-core/#operator-dictionary-human).

Unfortunately, this can not be started before we get rid of the old Restbase rendering mode.

So, I did some tests with the MathML output and it seems to work fine, even with unicode, but I had to use the package unicode-math, as the symbols provided by external packages did not work (I think that it should be fine with the ones of the ams* packages). From what I understood, \mathrel, \mathord, etc. determinate the spacing and the output type (mi or mo) of the symbol, and by default non-letter symbol are put in a <mo> tag. It doesn't exactly solve this issue, but I think it is good enough: for example, \mathbin{A} yields <mo lspace="0.222em" rspace="0.222em">A</mo> and \mathbin{AB} yields :

<mspace width="0.222em"></mspace>
<mrow>
<mi>𝐵</mi>
<mi>𝐶</mi>
</mrow>
<mspace width="0.222em"></mspace>

Also, this it doesn't respect exactly https://www.w3.org/TR/mathml-core/#operator-dictionary-human : sometimes it is because of TeX related reason (*= is treated as two separate characters, as TeX does), sometimes I don't know why ( has 0.222em of spacing instead of 0.166em), but I think it is not a problem.

I don't know how experimental it is, but we could try to use directly the MathML output generated by the TeX compiler (when Restbase is dropped). But it will make it harder if we want later to allow arbitrary wikicode inside of \text{}, which would be cool, for example in order to have links in math blocks or nested math expressions (e.g. it would be useful to allow things like 0 \text{if $x \leq 0$} in case expressions). This should not be a problem from a MathML perspective, as arbitrary HTML can be nested inside a <mtext>, but it may be a bit difficult (we need to call the whole wikicode parser back but only up to the end of the enclosing brace).

I think that trying to extend the list of accepted inputs may be easier, and will allow us to keep more control on the whole process.

@JeanCASPAR I think we currently have still too many open problems to switch to plain MathML directly, however, we can use MathJax for rendering MathML which gives quite nice results.