Page MenuHomePhabricator

RDF export for the math data type should not export input texvc string but its MathML representation
Closed, ResolvedPublic

Event Timeline

Physikerwelt claimed this task.
Physikerwelt raised the priority of this task from to Normal.
Physikerwelt updated the task description. (Show Details)
Physikerwelt added projects: Math, Wikidata.
daniel added a comment.Feb 9 2016, 5:44 PM

The string literal should probably use http://www.w3c.org/datatypes/mathMLLiteral as the type uri. The implementation could subclass LiteralValueRdfBuilder and override the getLiteralValue() method to supply the MathML representation.

Change 269473 had a related patch set uploaded (by Physikerwelt):
RDF Formatter for Math data type

https://gerrit.wikimedia.org/r/269473

The string literal should probably use http://www.w3c.org/datatypes/mathMLLiteral as the type uri. The implementation could subclass LiteralValueRdfBuilder and override the getLiteralValue() method to supply the MathML representation.

The url does not lead anywhere https://www.w3.org/TR/MathML3/ would be a good target I think. At the very moment I do not know to set this poperty at all.

Bene set Security to None.Feb 10 2016, 10:55 AM
Bene added a subscriber: mkroetzsch.

Markus: Can you comment what you believe should be in the RDF export?
I am leaning towards MathML or both.

Repeating my arguments from https://gerrit.wikimedia.org/r/269386 and https://gerrit.wikimedia.org/r/269473:

Personally I think the RDF output should contain what the users expect, which in my opinion is what they have entered and see in the JSON dumps. Why should that be different? What's the benefit? Which entity can use this for it's advantage?

If MathML is so much better, why don't we have a MathML property type? What's the benefit of having this conversion here, in the RDF export? What if a user of the RDF wants the original input? What if a user of the RDF output finds a mistake in the value and wants to make an edit on Wikidata? How and where will the MathML be converted back?

The format should be the same as in JSON. If MathML is preferred there, then this is fine with me. If LaTeX is preferred, we can also use this. It seems that MathML would be a more reasonable data exchange format, but Moritz was suggesting in his emails that he does not think it to be usable enough today, so there might be practical reasons to avoid it.

In any case, I am strongly against using a different format in RDF and JSON. Otherwise any tool-chain that uses RDF and JSON (e.g., a bot that uses SPARQL to fetch relevant information) would have to implement this conversion, maybe even back and forth. Tools that do large-scale processing (e.g., Wikidata Toolkit generating custom RDF dumps from JSON) would need to implement this conversion internally, even if there would be a web service (latency). It would be really a lot of work, without a clear benefit. Indeed, whatever format you pick, for whatever reason you pick it, the same reason should apply to all exchange formats alike.

If you use MathML but keep the TeX-like input syntax, then external users will also need a web service that can convert back and forth between these representations:

  • TeX->MathML is needed, e.g., for a query UI where users enter data to search for in SPARQL
  • MathML -> TeX is needed, e.g., to display the (raw) value of a math property to a user after it was returned from SPARQL

The two conversions do not need to be exact inverses, but they should hopefully stabilise after one round-trip. I think using MathML as the main exchange format would be doable, given such tool support exists. In particular, I am not concerned about showing different things to users than we use in our exchange formats. We are doing similar things with other types (dates are also written in a user syntax and then converted into an internal data model). The representation of dates is not the same in RDF and in JSON either, but the data structure is the same (same components that make up a date), which is very different from the situation of TeX vs. MathML.

Hi Thiemo,

different formats have different strengths and weaknesses.
The texvc LaTeX dialect is 100% compatible to the well established math
tags in MediaWiki. This can be used to move data from Wiki's to wikidata
and to display formula data from wikidata on Wiki's.
However, the web standard for math is MathML. MathML is supported by all
major tools that deal with Mathematical expression.
Therefore I think texvc should be used internally and the standard conform
MathML should be used externally.
The MathML expression includes the TeX representation, which can be used in
LaTeX documents and also to create new statements.

Best
Moritz

The MathML expression includes the TeX representation, which can be used in
LaTeX documents and also to create new statements.

That would address the conversion back from MathML to TeX. With this in place, we could indeed use MathML in JSON and RDF, if we wanted (assuming that this is doable for you, that is, that there is a suitable TeX->MathML conversion available in Wikibase).

Change 269473 merged by Mobrovac:
RDF Formatter for Math data type

https://gerrit.wikimedia.org/r/269473

Physikerwelt closed this task as Resolved.Mar 13 2017, 3:31 PM