Page MenuHomePhabricator

Localized TeX environment
Open, LowPublic

Description

Author: max

Description:
Many aspects of TeX rendering depend on local traditions. The most essential are:

  • National characters in formulae and indices cannot be entered.
  • The decimal separator is a comma in many countries (ru, fr, de and others). TeX by default treats a comma as a list separator, adding a very large skip after in: compare <math>3.14\,\!</math> and <math>3,14\,\!</math>. Actually, we can write <math>3{,}14</math> but it's a dirty hack.
  • Some functions have national variants of designation, for example, tg instead of tan. To avoid clumsy <code>\mathop</code> construction, we need a method for entering such names.

A reasonable way for fighting with all these inconveniences is to allow national versions having their own TeX macros in preamble. There exists "babel" in LaTeX, why not in texvc?

Details

Reference
bz2458

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 8:32 PM
bzimport added a project: Math.
bzimport set Reference to bz2458.

avarab wrote:

(In reply to comment #0)

  • National characters in formulae and indices cannot be entered.

Please confirm that this is indeed the case, the math package supports that if
you configure it correctly, but it hasn't been setup like that on Wikimedia sites.

And please submit bugs that describe one and only one issue, full UTF-8 support
in the math pagkage and some kind of template support are totally different
issues, marking this as INVALID,

max wrote:

It is a bug of the wikipedia.org installation and possible other installations
which pretend to support more than one language.

Converting to UTF-8 does not resolve all localization problems, even if the TeX
engine will be moved to a fully-unicode version (Omega?). The point is not to
try fitting all possible localization problems in one basket but to put
language-dependant things in language-dependant places.

max wrote:

It depends on bug 798, but is not the same

fibonacci.prower wrote:

Example: the Spanish word for "limit" is "límite", which is abbreviated "lím".
There is no way to write this with the current Texvc implementation - \acute{i}
is not rendered as í, but as an acute accent over an 'i' with a dot. Besides,
<math>\lim_{x\to0}\frac{\sin(x)}{x}</math> is correctly rendered with the
<math>x\to0</math> part under the "lim", while
<math>\mbox{l}\mbox{\acute{i}}\mbox{m}_{x\to0}\frac{\sin(x)}{x}</math> is not,
and it looks awful.

I know from experience that loading the "babel" package with the appropriate
option solves almost all of the problems, at least in Spanish, and probably for
the other languages too.

alon wrote:

(In reply to comment #1)

And please submit bugs that describe one and only one issue, full UTF-8 support
in the math pagkage and some kind of template support are totally different
issues, marking this as INVALID,

It's likely that this describes one and only one issue. The bug may not be in
Unicode parsing, but in misinterpretation by texvc of non-ASCII characters,
which is in all likelihood triggered by lack of locale support. The later is
usually ensured by the babel LaTeX package, which also localizes abbreviations
for mathematical functions.

taw wrote:

I would strongly advise against using different setups on different Wikipedias.
This is simply asking for all kinds of weird problems.

If you need \tg etc., just add it to texvc as alias for \mathop{tg},
something like that in the big case statement in texutil.ml:

"\\tg" -> LITERAL (HTMLABLEC(FONT_UFH,"\\mathop{tg} ","tg"))

The tests seemed to indicate that one cannot write accented characters directly
in math mode, so I'm not sure how you want to get \mathop{lím} there.

alon wrote:

Why should localization of texvc through the use of a well-tested and
widely-used LaTeX package cause "weird problems"?

fibonacci.prower wrote:

(In reply to comment #6)

I would strongly advise against using different setups on different Wikipedias.
This is simply asking for all kinds of weird problems.

Echo Alon Lischinsky.

If you need \tg etc., just add it to texvc as alias for \mathop{tg},
something like that in the big case statement in texutil.ml:

"\\tg" -> LITERAL (HTMLABLEC(FONT_UFH,"\\mathop{tg} ","tg"))

And how, exactly, do I "just add it"?

The tests seemed to indicate that one cannot write accented characters directly
in math mode, so I'm not sure how you want to get \mathop{lím} there.

However it can be done. I don't really care how, as long as it works.

robchur wrote:

Alright, let's keep the priorities sensible. And drop the attitudes, please. Thanks.

chlewey wrote:

(In reply to comment #6)

I would strongly advise against using different setups on different Wikipedias.
This is simply asking for all kinds of weird problems.

Well, Wikimedia Foundation charter states that Wikimedia is a multilanguage
project, so localization of resources is something MediaWiki should attempt to
do, IMO.

babel package is supposed to work with both LaTeX and Plain-TeX, so we have a
tested package as a tool.

I would like to hear a technical, rather than a political opinion, for why babel
(or any other type of regionalization) is not practical or feasible for
WikiMedia's texvc implementation.

If the same source renders differently on each wiki, it'll complicate the shared
storage of rendered images. More generally, a multilingual solution rather than
monolingual is preferred, so all languages can be used on all sites.

fibonacci.prower wrote:

(In reply to comment #11)

More generally, a multilingual solution rather than
monolingual is preferred, so all languages can be used on all sites.

But while such a solution is found, it would be better to use babel instead of
waiting.

Back in the days where article titles in the English WP could only use letters
from Latin-1, WPs in languages whose alphabet wasn't included in Latin-1 (such
as Polish or Esperanto) were allowed to name their articles in their own
character encoding (or was it Unicode? I don't remember), instead of waiting for
a multilingual solution to be implemented on all the WikiMedia projects. Well,
why not do the same here?

Please stop manipulating the priority tags; it doesn't do you any good and just
annoys people, making it less likely anyone will want to do whatever you are
asking for.

chlewey wrote:

(In reply to comment #11)

If the same source renders differently on each wiki, it'll complicate the shared
storage of rendered images. More generally, a multilingual solution rather than
monolingual is preferred, so all languages can be used on all sites.

But, anyhow, mathematical convensions are different in different languages, so
it is likely that either the formulas use different sources (some with
workarounds) which make the argument of shared storage void, or just share the
source making the article/book look like a shiftcoded work.

This is, IMO, still a political rather than a technical argument: you are not
saying it is imposible/too complicated/prone to unseen failures, but rather that
this is inconvenient from the point of view of shared resources, when shared
resources are, in this context, either something that is not happening or an
imposition of language convensions.

No, it's a technical argument. If simply turned on, this would cause different
renderings to stomp on each other and make rendered display inconsistent.

alon wrote:

(In reply to comment #15)

No, it's a technical argument. If simply turned on, this would cause different
renderings to stomp on each other and make rendered display inconsistent.

Not inconsistent, but rather consistent with the language setting for the local
wiki (or, if it could be implemented, for the user viewing the page).

You're placing technical ease-of-maintenance (or worse, technical resource
optimization) above usability, a flawed line of reasoning which, if pursued,
could lead to producing an extremely efficient product that doesn't fulfill the
users' needs. I don't see why non-English speakers shoudl be forced to employ
English, a language they may not even understand, for their formulae.

No, *inconsistent* because it would change back and forth depending on who
rendered it last.

alon wrote:

(In reply to comment #17)

No, *inconsistent* because it would change back and forth depending on who
rendered it last.

Only if PNG caching is done independently of user language setting. Which need
not be the case: just like thumbnailed images are rendered as per the user's
thumbnail size setting selection, LaTeX rendering can be linked to interface
language. And it should, unless your particular criterion is, as I said before,
that technical optimization should not take end-user satisfaction into account.

fibonacci.prower wrote:

Why was this tagged as fixed? I've just checked, and \lim renders without an
accent on es:.

jutiphan wrote:

This is quite an important bug since it affects all other languages (that just do not use English characters) These problems have been so long standing and
seems no one bother to fix them. It would be great if globalization is being paid attention to.

This issue particularly affects Thai wikipedia and is having problems writing equations with words in it.

physik wrote:

did you try to use \text if you want to insert text or other symbols that have no specific math rendering.
I think it makes little sense to support characters that are not supported by LaTeX since it's not defined how they look like. So I'd recommend to use \text{special words characters or symbols} to get the browsers rendering for those parts of the equation.
There are some cases were the text method does not work. If you discover such situations that are not covered by
https://bugzilla.wikimedia.org/show_bug.cgi?id=48032
Please open a new bug with a specific example.

*** This bug has been marked as a duplicate of bug 48032 ***

Moritz, I don't think this is a duplicate. In TeX/LaTeX terms, this is about the babel package. E.g., babel set to Spanish will render \lim, \max, min etc. as lím, máx, mín etc.

At least for MathJax, this can be solved by simply writing sets of macros. (Not sure how the math extension / database side would work.)

physik wrote:

Peter, thanks for the hint. Obviously, I misunderstood that.
I would see that as new feature for one version in the future. Hopefully (Repopend) is the right category for that. I also added it to the Roadmap.
https://www.mediawiki.org/wiki/Extension:Math/Roadmap
It's also a nice example to demonstrate the usefulness of Content MathML.

jutiphan wrote:

Physikerwelt, Thank you for looking into this. It's been long 7 years coming that I actually have forgotten about it.

@Tacsipacsi At the moment we are actually discussing changes to the syntax in T195861, it would be nice if you would join or commission. From my perspective the most important part would be that national characters like ü in \text{für alle} which can be entered, also get an acceptable rendering in all browsers.