Author: physik
Description:
Hi,
inspired by the security workshop of the Amsterdam Hackathon (2013) I see some minor issues with the security of the math extension. I think the rendering options (Source, MathJax and LaTeXML) are potentially dangerous since they do not check input and validate/escape the output only.
(The same holds for the alt attribute of the generated texvc image.)
I had a deeper look into the texvc code. The parser seems to do a very good job and I think we should continue to use the parsed output before processing it with MathJaX, LaTeXML or another ?-technology. My concern about the texvc parser is the ocaml language. I think more people are used to antlr nowadays. It’s not too much work to convert the ocaml script to grammar files (.g) that are quite common in the university landscape where each researcher develops her/his own language;-)
The bad news about is that the php output tool used for the TeX checking task.
However, as a result I suppose to change the math rendering process in the
https://code.google.com/p/antlrphpruntime/
is currently not maintained by a lot of people and I’m not sure if it can be following way
- Check Input Syntax + Semantics based on grammar.
- Pass checked TeX to the renderer.
A further positive aspect by a global grammar is that there is a defined standard which commands are allowed in the math tags and which commands are not allowed.
Apart from that checking the output can introduce additional security. (cf. https://gerrit.wikimedia.org/r/#/c/66365/)
Best
Physikerwelt
Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=54624