Page MenuHomePhabricator

Implement Math Accessibility Features necessary for Intent attribute usage
Open, Needs TriagePublic

Description

This task consists of four bigger sections. The tasks are ordered by their order of implementation.
For readability, some of the bigger tasks contain a written overall description.
This task is for discussion, clarification and estimation of efforts.

(I) For MathML in Math extension MediaWiki/Wikipedia:

Mini:

  • fix Exception throwing by LocalChecker to $this->valid = false and warning or error (done)
  • some class-based unittests for the base mml classes

(II) For Wikidata / Wikipedia Semantics Import:
AnnoMaTex is used for Formula Annotation.
Formula Concepts are integrated in LaTeX of formula:
<math display="block" qid=Q35875>E=m\,c^2</math> in Wikitext source code.
Each of the constants (E,m,c) can point to another wikidata concept (by the ‘defining formula’ property (P 2534)).
The in-depth info for an annotated formula (in example above) can be fetched by retrieving a Wiki-special page which
holds all the annotations. (Edit: This is an example of the special-page looks like in a wiki ??)The Wikidata item the qid points to, holds MathML which contains the intent annotations.

In a nutshell: it is the qid which is added to a formula on a Wikipage determines the Intents, these are located as annotations of the MathML on the Wikidata page the qid points to.
The feature for annotating Wikidata pages of formulas with intents has yet to be developed. //

  • Create a list of the already available annotated formula with wikidata qids, which can be used for our tests. (optional)
  • Develop the feature for annotating Wikidata items with intents (is there a validation necessary here? or some type of GUI based forms?) (maybe a follow-up project for the far future)
  • Annotate the wikidata elements to which the list points to with Intent annotations within Wikidata / Wikibase . (This can be done by annomathtex, however I don't think it relates to this project)

I think in section (II) there is no work to be done.

(III) Math extension composing the final MathML for Screenreaders :

Intents for formulas are not generated by the Math extension itself. It gets the annotation of from Wikidata, see (II).
If formulas have qid-semantic-annotations with annotated intents in corresponding Wikidata items, the formulas are considered as non-default configuration.
For these formulas the intent attributes get read from a Wikipage(Special:Pages) which can be created by the Math extension for each Wikidata item.

  • Find a way to annotate Wikidata Items with Intent in Wikipedia page
  • Implement the Intent Grammar (similar to the MathML-Nodes currently), so MathML by TexVC(PHP) can contain Intent elements
  • For formulas with qids-semantic-annotations, fetch the SpecialPage, check for intent annotations
  • The MathML is generated for each formula with TexVC, if there are intents in the special-page, they parsed (does this contain validation of the annotated attributes?) and added to the final MathML which is delivered to Screenreaders (also here, what happens if there are multiple nested qid-annotations for a formula): Alignment of Special Page content to MathML
  • Build a minimal test which checks the generation of all necessary elements in MathML for intent
  • Build a minimal test which checks the intent generation for the currently existing non-default intent items from Wikidata (these are 5-20 formula, these are the list items mentioned in II)
  • It should have 100% test coverage for the intent grammar.

Alternative plan is in subtask since 21.06.23: Instead of composing MathML with intents in Math extension, browser extension written in Javascript shall do this.

(IV) Checking the generated speech in Wikipages:

  • Speech to Text activation OR use Screenreader capabilities OR find another recent and suitable program which can create intents
  • Define a set of cases and check the speech output, before and after our implementations
  • https://phabricator.wikimedia.org/T327394: Evaluate Speech Output for MathML with Intent attributes ( to be clarified, but i think this would be enough for the current scope)

I think this has to be spelled out elswere. Maybe It would be good project to do in collaboration with a developer of a screenreader software.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenPhysikerwelt
ResolvedStegmujo
Resolvedmmartorana
ResolvedPhysikerwelt
ResolvedStegmujo
DuplicateStegmujo
ResolvedStegmujo
ResolvedStegmujo
ResolvedStegmujo
DuplicateNone
DeclinedNone
OpenNone
DeclinedNone
OpenStegmujo
StalledNone
ResolvedNone
ResolvedStegmujo
ResolvedPhysikerwelt
ResolvedStegmujo
ResolvedPhysikerwelt
ResolvedStegmujo
ResolvedBUG REPORTStegmujo
ResolvedStegmujo
ResolvedStegmujo
ResolvedStegmujo
ResolvedStegmujo
ResolvedStegmujo
DuplicateStegmujo
DuplicateStegmujo
DeclinedStegmujo
DeclinedStegmujo
ResolvedStegmujo
ResolvedStegmujo
ResolvedNone
ResolvedStegmujo
ResolvedStegmujo
ResolvedPhysikerwelt
ResolvedPhysikerwelt
DuplicateNone
ResolvedStegmujo
ResolvedStegmujo
ResolvedStegmujo
ResolvedStegmujo
ResolvedStegmujo
ResolvedStegmujo
ResolvedStegmujo
ResolvedStegmujo
ResolvedStegmujo
OpenStegmujo
OpenStegmujo
ResolvedStegmujo
ResolvedBUG REPORTStegmujo
ResolvedBUG REPORTStegmujo
OpenBUG REPORTNone
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedBUG REPORTPhysikerwelt
ResolvedPhysikerwelt
ResolvedPhysikerwelt
ResolvedPhysikerwelt
ResolvedBUG REPORTStegmujo
ResolvedStegmujo
InvalidNone
OpenNone
OpenNone
OpenNone
ResolvedPhysikerwelt
OpenStegmujo
ResolvedBUG REPORTDaimona
OpenNone
OpenNone
OpenStegmujo
OpenStegmujo

Event Timeline

Stegmujo updated the task description. (Show Details)
Stegmujo updated the task description. (Show Details)
Stegmujo updated the task description. (Show Details)
Stegmujo updated the task description. (Show Details)

@Physikerwelt could you have a look at this and check if everything is correct, something is missing and edit where necessary ?

done. added some comments. Overall I think it might be reasonable to break it down into some subtasks.

Thanks for adding the hints, here some questions and remarks (to @Physikerwelt ) to refine the essence of the tasks:

  • section I:
    • For generating MathML from Mathoid/LateXML for the testswith MathSearch-Extension i suppose to write maintenance script (or eventually a test) which does render MathML similar to here in testAlttext. Or is there any implementation i have overlooked (as it might be on an earlier branch) which does this? In case you know, any hints welcome.
    • does the comparison algorithm contain some type of similarity score etc., how was performance measured earlier?
    • "i don't understand that": The parsetree in TexVC(PHP) is currently in many cases not enough to create valid MathML, example cases can be found in this test-file in texvctreebugs, the ultimate solution is to refactor the grammar file so that the parsetree by TexVC(PHP) is correct for generating MathML.
  • section II:
    • How can we use intents from the Wikidata items in this project (with the scope of the publication in mind) ? How can the intents be notated in these Wikidata items?
    • How can a list with Wikipages which have annotated qids be found ?
    • How can a list with Wikidata items which have intents notated be found ?
  • section III:
    • If we read intent attributes which come from some kind of user-based annotation, is it required to validate the correctness of these attributes by methods in the math extension, before they get forwarded to the browser-users in the MathML?
  • section IV:
    • how can we have working examples for speech synthesis from the output of our system, which is adding intent to MathML, for the upcoming publication ?

Thanks for adding the hints, here some questions and remarks (to @Physikerwelt ) to refine the essence of the tasks:

Thank you.

  • section I:
    • For generating MathML from Mathoid/LateXML for the testswith MathSearch-Extension i suppose to write maintenance script (or eventually a test) which does render MathML similar to here in testAlttext. Or is there any implementation i have overlooked (as it might be on an earlier branch) which does this? In case you know, any hints welcome.

I was guessing one can start with the UpdateMath maintenance script that you updated recently. You just need put the formulae you want to test in your wiki and they will be found by the script. Thereafter you can get the MathML form the DB. So you probably don't need to implement a single line of code.

  • does the comparison algorithm contain some type of similarity score etc., how was performance measured earlier?

It was done based on images. Maybe this is too complicated. We can start with tree-edit distance as a similarity measure. But maybe we just want to figure out if same or different for at the moment?

  • "i don't understand that": The parsetree in TexVC(PHP) is currently in many cases not enough to create valid MathML, example cases can be found in this test-file in texvctreebugs, the ultimate solution is to refactor the grammar file so that the parsetree by TexVC(PHP) is correct for generating MathML.

I still don't understand, let's discuss that in a f2f meeting.

  • section II:
    • How can we use intents from the Wikidata items in this project (with the scope of the publication in mind) ? How can the intents be notated in these Wikidata items?

We need to figure this out.

  • How can a list with Wikipages which have annotated qids be found ?

We could search for them, but why do we need this list?

  • How can a list with Wikidata items which have intents notated be found ?

The answer is the same as above.

  • section III:
    • If we read intent attributes which come from some kind of user-based annotation, is it required to validate the correctness of these attributes by methods in the math extension, before they get forwarded to the browser-users in the MathML?

Somehow. I don't see the practical implication.

  • section IV:
    • how can we have working examples for speech synthesis from the output of our system, which is adding intent to MathML, for the upcoming publication ?

We produce examples of valid MathML with intents and present them at the W3C MathWG meeting.

I was guessing one can start with the UpdateMath ...

Ok, this is a start, when this is set up and generated with the UpdateMath file, the only difficulity i see is to filter the correct Wikipages from the output (depending on the content of Database), there can be lots of items processed by this scripts). But this should be possible on a local instance with MediaWiki etc which only has the most necessary Wikipages.

It was done based on images ....

Ok, i guess either Text-Based (i.e. tree-edit distance) or Image comparison can be done, let's see what the compared outputs look like. If there are many differing artifacts within the tool specific MathML notations, image comparison might make more sense.

want to figure out if same or different for at the moment?

To create an accurate estimation of the effort in the task. This will enable us to keep up with deadlines realistically. Also, to make effective planning ahead.

I still don't understand, let's discuss that in a f2f meeting.

Ok, agree, I uploaded a html-file here which has the MathML of the erroneous cases. Preliminary to the f2f-meeting, looking at the MathML for sideset case will create some understanding.

We could search for them, but why do we need this list?

For both lists ... to have practical example data for the feasibility of the complete created computer system in the scope of publication.

Somehow. I don't see the practical implication.

Validation of intent-attributes to the user would basically have security implications (similar to validation of LaTeX), no harmful script etc. could be annotated as intent-attribute which is then forwarded to the browsers and screen readers. Maybe there is already a way in Wikidata to mitigate such cases generically. Another reason would be to have 'valid' intent attributes processed by Math extension / screen readers, which are definitely machine-readable.

How can we use intents from the Wikidata items in this project (with the scope of the publication in mind) ? How can the intents be notated in these Wikidata items?

I think this is a major thing to figure out to have a complete overview on future efforts. Any starting pointers? For a start clarifying this, see the next comment.

We produce examples of valid MathML with intents and present them at the W3C MathWG meeting.

There might be a simple solution (i.e. CLI-tool producing spoken language as text) which is already synthesizing speech with intents, this would be enough to have a proof of concept.
Maybe they know a tool which creates that. I think for a publication, it would be somewhat necessary to have a simple 'proof' that there is such a tool which reads the generated output already and improves speech synthesis with that.

MathCat seems to be a suitable library for testing the speech generation from MathML with Intent attributes. (Edit: Moved the initial evaluation of MathCat here

Stegmujo updated the task description. (Show Details)

Clarification of the open question :

How can we use intents from the Wikidata items in this project (with the scope of the publication in mind) ? How can the intents be notated in these Wikidata items?

To have a foundation for discussions, here is an example (for clarification of the process) with ambiguities from W3C Accessibility gap analysis:
TeX:

(0 , 5 )

"The Point could be an open interval, gcd, cycle, or an ordered tuple, vector etc."

MathML (already with intent, which resolves the ambiguity to a coordinate point):
See also intent reference.

<mrow intent="point($1,$2)">
  <mo>(</mo>
  <mi arg="1">0</mi>
  <mo>,</mo>
  <mi arg="2">5</mi>
  <mo>)</mo>
</mrow>

Some selected Wikidata QID's for ambiguity resolution:
https://www.wikidata.org/wiki/Q44946
https://www.wikidata.org/wiki/Q3250736

As I understand, the flow of data (from formula notation to formulas being sent to the screen reader) is this:

  1. The TeX (for the example point-formula) gets written to the source of a Wiki page
  2. Since this TeX is ambiguous, an annotation with the clarifying qid (Q3250736) gets added with AnnoMathTex by a user, probably with the 'Formula Annotation' dialogue, not completely clear how to get the coordinate-QID here. For example in the formula $ (x,y) $ on Cartesian coordinate system Wikipage. Just for testing i annotated the formula with another suggested QID here. Saved the page then.
  3. To be clarified: Where does the annotated Wikitext appear from the WMFLabs Annomathtex (which Wikipedia URL)?
  4. Edit after clarifications to AnnoMathTex: Currently AnnoMathTex does not generate the annotated Wikitext with the QID, this would have to be implemented with a python script etc which combines annotations from one file to the Wikitext source.
  5. The pop-ups (which are running on Wikipedia-Beta-Cluster) can be used to annotate math-items with qids. So the wikidata items are connected. Here the open questions, how to proceed with nested formulas and how to add annotations
  6. To go on here, the ideal case (from AnnoMathTeX paper) is assumed: having formula as this from AnnoMathTex in the Wikitext with the source : <math display="block" qid=Q3250736> ( 0 , 5 ) </math>
  7. To come from associated Wikidata item to the intent format, I see these possibilities:
    1. AnnoMathTeX directly annotates with intent as a source (since it has multiple sources), this is my difficulty with this: AnnoMathTex annotates TeX-Formula, intents are within MathML.
    2. The associated Wikidata Item holds the Intent information in some property, some other item https://www.wikidata.org/wiki/Q204819 this has a property for 'defined formula' the formula is in TeX, Intent attribute is a MathML feature, so holding intent information would require to have the MathML of the formula also in Wikidata. There is an 'in defining formula' property which can hold annotations for each element in the TeX(!)-Formula. The 'in defining formula' property enables to annotate 'atomar'-elements (like E->Energy, M->Mass), but as i see it not the complete formula.
    3. In Math extension, TexVC (PHP) we implement a mapping from known QIDS to intents (i.e. as json-file), this somewhat removes the user-based annotation aspect for intent content, but might be sufficient for a working prototype.
    4. Wikidata items hold MathML, MathML can be edited and intent added. I guess, this would be the simplest solution, but users could insert mistakes while editing. To my knowledge, this is not implemented yet.
    1. Looking forward to reading from another possibility for this step not mentioned here
  1. In an ideal case, MathML with Intent is delivered for a formula from here. Then, since mapped qid-Formula from Wikidata and formulas in Wikitext can differ (example Point ( 0 , 1 ) maps to wikidata notation ( x , y )) the MathML from Wikidata has to be processed in Math extension to obtain the final MathML for the users.

On the foundation of the example here, @Physikerwelt how would you proceed in making intent available to browsers/screenreaders? Do you consider the flow of data correct? Which possibility for resolution do you see for Point 5 in 'flow of data'?

Stegmujo updated the task description. (Show Details)
Stegmujo updated the task description. (Show Details)
Stegmujo updated the task description. (Show Details)

→ It shows how to annotate terms within a formula with other Wikidata-Concepts. Not directly clear how to add intents.

Stegmujo updated the task description. (Show Details)

Change 892461 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix LocalChecker exception throwing

https://gerrit.wikimedia.org/r/892461

Change 892461 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Fix exceptions thrown by LocalChecker

https://gerrit.wikimedia.org/r/892461

https://www.gipp.com/wp-content/papercite-data/pdf/scharpf2018.pdf

Mapping QID to Math element needs resolution of items within tex see this paper

Annotation would be necessary to reach this, like this in Wikitext:

<math qid=12345 > \sqrt{a1} </math> <!-- math-annotation w{Q11423}{m} w{Q2111}{c}^2 -->

<math qid=12345 > \sqrt{a2} </math> <!-- math-annotation "\w{Q11423}{m} \w{Q2111}{c}^2" -->

<math qid=12345 > \sqrt{a3} </math> <!-- math-annotation \\w{Q11423}{m} \\w{Q2111}{c}^2 -->

<math qid=12345 > \sqrt{a4} </math> <!-- math-annotation \w{Q11423}{m} \w{Q2111}{c}^2 -->

<math qid=12345 > \sqrt{a5} </math> <!- "<annotation>\w{Q11423}{m} \w{Q2111}{c}^2</annotation>" -->
<math> end </math> <!-- math-annotation ok -->

Stegmujo updated the task description. (Show Details)