Page MenuHomePhabricator

In Math formula, Chinese, Japanese, and Korean letters have too short a space between them.
Open, Stalled, HighPublic

Assigned To
None
Authored By
Libattery
Nov 27 2019, 1:13 PM
Referenced Files
F31700657: exMath.svg
Mar 25 2020, 1:10 AM
F31700666: app.js
Mar 25 2020, 1:10 AM
F31474502: image.png
Mar 25 2020, 1:10 AM
F31469433: render.js
Dec 11 2019, 8:37 AM
F31469440: image.png
Dec 11 2019, 8:37 AM
F31359639: mathoid.js.zip
Nov 28 2019, 8:24 AM
F31359635: image.png
Nov 28 2019, 8:24 AM
F31314816: image.png
Nov 27 2019, 1:13 PM

Description

In Math extension, Chinese, Japanese, and Korean letters have too short a space between them.
So hard to read, and very ugly.

AS-IS / from wikipedia sandbox

image.png (268×218 px, 22 KB)

TO-BE / from mathjax.or site
image.png (326×372 px, 40 KB)

same source:
<math>\begin{array}{|l|}
\text{abcdefghijk} \\
1234567890 \\
\text{中华人民共和国}\\
\text{にっぽんこく}\\
\text{대한민국서울} \\
\end{array}</math>

I'd appreciate it if you could fix it.

Event Timeline

I made quick solution. Some kind of Ad hoc. but it works.

image.png (288×268 px, 21 KB)

I changed some codes of routes/mathoid.js

I got errors like this.
"SVG - Unknown character: U+B0A0 in MathJax_Main,MathJax_Size1,MathJax_AMS"

That means, I guess Mathjax don't know the size of CJK characters.
So I added small space "FOUR-PER-EM SPACE" (U+2005) after every CJK characters.

Here is my code.

@Libattery so you are doing the following regexp replacements

//begin CJK spacing - by libattery
var cjk = new RegExp(
"(["
+"\u1100-\u11FF\u3130-\u318F\uAC00-\uD7AF" //Korean
+"\u2E80-\u2EFF\u31C0-\u31EF\u3200-\u32FF" //Chinese
+"\u3400-\u4DBF\u4E00-\u9FBF\uF900-\uFAFF" //Chinese
+"\u3000-\u303f\u3040-\u30FF\u31F0-\u31FF\uff00-\uff9f" //Japanese
+"])(\u2005?)"
,"ug");
req.body.q = req.body.q.replace(cjk, "$1\u2005");

I think it would better to replace that in texvcjs (only for text content). Can you make a pull request for that and maybe add a test case. Thank you.

Libattery added a subscriber: Physikerwelt.

@Physikerwelt Let me know how to return. I couldn't figure out what to do.

 BOX: function(target, box, s) {
     // \box{s} where box is \text, \mbox, \hbox, or \vbox
     //         and s is a string not containing special characters

  var cjk = new RegExp(
  "(["
 +"\u1100-\u11FF\u3130-\u318F\uAC00-\uD7AF" //Korean
 +"\u2E80-\u2EFF\u31C0-\u31EF\u3200-\u32FF" //Chinese
 +"\u3400-\u4DBF\u4E00-\u9FBF\uF900-\uFAFF" //Chinese 
 +"\u3000-\u303f\u3040-\u30FF\u31F0-\u31FF\uff00-\uff9f" //Japanese
 +"])(\u2005?)"
 ,"ug");
 s= s.replace(cjk, "$1\u2005");

//return match(target, box);   //? how to return s?

Ah sorry the link pointed to the incorrect file (the util file, that determines the required packages). You want to change the rendering file
https://github.com/wikimedia/texvcjs/blob/fc8ff74ba85ec2b646b1c33d2c560c40f5a3139c/lib/render.js#L126

@Physikerwelt

I found a better way to solve this problem.

https://github.com/mathjax/MathJax-node/releases/tag/1.3.0

...This release introduces a new configuration option cjkCharWidth to control the width of CJK characters. ...

I tested that version(MathJax-node 1.3.0) instead of current mathjax-mathoid-node 0.7.0

I changed only 3 lines to change from mathjax-mathoid-node 0.7.0 to MathJax-node 1.3.0
https://github.com/wikimedia/mathoid/blob/master/package.json#L50
https://github.com/wikimedia/mathoid/blob/master/app.js#L15
https://github.com/wikimedia/mathoid/blob/master/lib/render.js#L7

I got fairly good results. It's seems to me a little bit wide, but not good anyway.

image.png (270×258 px, 21 KB)

So I'm not sure if we should modify texvc. How do you think?
If upgraded dependencies work, I guess we don't need to add code to texvc.


Anyway here is the texvc render.js code :

BOX: function(bt, s) {
    //CJK spacing - by libattery
    var cjk = new RegExp(
    "(["
    +"\u1100-\u11FF\u3130-\u318F\uAC00-\uD7AF" //Korean
    +"\u2E80-\u2EFF\u31C0-\u31EF\u3200-\u32FF" //Chinese
    +"\u3400-\u4DBF\u4E00-\u9FBF\uF900-\uFAFF" //Chinese
    +"\u3000-\u303f\u3040-\u30FF\u31F0-\u31FF\uff00-\uff9f" //Japanese
    +"])(\u2005?)"
    ,"ug");
    s = s.replace(cjk, "$1\u2005");

    return curlies(bt + curlies(s));
},

The test case:

bin/texvcjs '\text{abcd가나다라1234中华人民ABCDにっぽん}'

will get

+{\text{abcd가 나 다 라 1234中 华 人 民 ABCDに っ ぽ ん }}

Upgrading mathjax node is a major issue and needs rigours testing of all features. Maybe you can cherry pick that feature to mathoid-mathjax-node. Can you also try to render the example at hand with latex so that we have a reference and can compare how this is supposed to be with regular latex rendering?

Physikerwelt changed the task status from Open to Stalled.Mar 21 2020, 6:46 PM
Physikerwelt triaged this task as High priority.

I tried to install latex in my computer, but I gave up.
It's too complicated to make it support CJK.
And I tried to use online Latex services but also can't find anyone supporting CJK font.

so I tried to make svg with mathoid-mathjax-node directly.

But I can't find mathoid-mathjax-node repository.
mathoid-mathjax-node page is now linked to mathjax-node page.
https://www.npmjs.com/package/mathoid-mathjax-node

image.png (1×2 px, 269 KB)

Opps, What can I do?

Anyway I make a svg example with mathjax-node

for:

\begin{array}{|l|}
\text{abcdefghijk} \\
1234567890 \\
\text{中华人民共和国}\\
\text{にっぽんこく}\\
\text{대한민국서울} \\
\end{array}

how to make :

I write a code (app.js)

and you can use,

mkdir mjtest
cd mjtest
cp [Somewhere]/app.js .
 
npm install mathjax-node
node app.js > exMath.svg

my node -v is v6.17.1

svg: