Page MenuHomePhabricator

Change default font for EasyTimeline on zh projects to something that actually has glyphs for Chinese characters
Closed, ResolvedPublic

Description

All Chinese characters in the time line would be shown as "□". Seems the "FreeSan.ttf" doesn't contain full UTF-8 character set. Maybe change the font to "ArialMSUnicode.ttf" or something would solve this.

Regards.


Version: unspecified
Severity: major

Details

Reference
bz20825

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
C933103 added a subscriber: C933103.EditedJun 5 2015, 7:32 AM

While the problem have not been resolved yet and EasyTimelines are still displaying witgiut text, I'd like to mention that according to http://wenq.org/wqy2/index.cgi?HanziStyles the wqy font is using China version's glyph. Once after the issue is fixed, should it also install another font that come with Taiwan version's glyph for users browsing the wikipedia in zh-tw/hk/mo, and is it technically viable ti display different font for people requesting different edition of the page? (Actually, should i file another bug for this?)

Roytam1 added a subscriber: Roytam1.EditedJun 24 2015, 2:45 AM

Since fonts-arphic-uming is installed in servers, can't we just use uming.ttc for EasyTimeline (zh-TW/zh-HK)?

Restricted Application added a subscriber: Dereckson. · View Herald TranscriptJun 24 2015, 2:45 AM
Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptAug 22 2015, 5:44 PM
hashar removed a subscriber: hashar.Aug 23 2015, 7:33 AM
Restricted Application added a subscriber: JEumerus. · View Herald TranscriptJan 28 2016, 2:10 AM
Meno25 removed a subscriber: Meno25.Feb 19 2016, 5:51 PM

Unrelated to the EasyTimeline codebase (it's configuration only, see T22825#835162), hence removing that project again from this task.

Cwek added a comment.Jul 6 2016, 12:51 AM

Since fonts-arphic-uming is installed in servers, can't we just use uming.ttc for EasyTimeline (zh-TW/zh-HK)?

because the extension is for the whole site, maybe it doesn't send the variant parameter to change the font.

Cwek added a comment.Jul 6 2016, 12:58 AM

T123223 It seem that some new free fonts had been installed in the server. Can them be used for the issue? @Aklapper

Since fonts-arphic-uming is installed in servers, can't we just use uming.ttc for EasyTimeline (zh-TW/zh-HK)?

because the extension is for the whole site, maybe it doesn't send the variant parameter to change the font.

fonts-noto-cjk is not unified but region-based font for China/Hong Kong/Taiwan/Korean. so will it cause problem with cross languages EasyTimeline setting?

Aklapper added a comment.EditedJul 7 2016, 10:05 AM

T123223 It seem that some new free fonts had been installed in the server. Can them be used for the issue? @Aklapper

I do not know enough about EasyTimeline's configuration to answer this, sorry. :(
Let's hope to receive an answer in T84777 where you also asked this.

@Roytam1 There are no such thing known as unified font in the world at least as of now, and they are not supposed to be unified. Alternative to region based font would be a font that are designed according to a particular regional standard or according to font developer's habit. See Unicode's FAQ about CJK for further detail. I am not familiar with server environment nor the software's setting, but in home environment when an application use a multiregion font without specifing a region, the result is often default to China's standard.

Dzahn added a subscriber: Dzahn.Aug 23 2016, 6:32 PM

The subtask has been resolved. font packages are now being installed on all appservers should be done in a couple minutes. please see if this has fixed the problem.

fatalmonitor yields:

125 not find/open font (unifont-5.1.20080907)

I guess it is transient / pending puppet to run on all app servers.

Cwek added subscribers: gerritbot, hashar.EditedNov 11 2016, 2:04 AM

Related URL: https://gerrit.wikimedia.org/r/64205 (Gerrit Change I3c03bf9b2352a4e577f94ad92d2d38021cf12968)

@hashar

Is this patch working? Is it the problem of the wrong font-filename?

Or Is it the problem of the ttf and ttc filetype? It seems like the ttc is the collection of the ttf ,and some will not use it normally. Can the program use the ttc file?

Related URL: https://gerrit.wikimedia.org/r/64205 (Gerrit Change I3c03bf9b2352a4e577f94ad92d2d38021cf12968)

@hashar
Is this patch working? Is it the problem of the wrong font-filename?

It seems not working in production:
https://zh.wikipedia.org/wiki/2016%E5%B9%B4%E5%A4%AA%E5%B9%B3%E6%B4%8B%E9%A2%B1%E9%A2%A8%E5%AD%A3#.E9.A2.A8.E6.9A.B4.E6.99.82.E9.96.93.E8.A1.A8

Cwek added a comment.Nov 11 2016, 2:20 AM

Related URL: https://gerrit.wikimedia.org/r/64205 (Gerrit Change I3c03bf9b2352a4e577f94ad92d2d38021cf12968)

@hashar
Is this patch working? Is it the problem of the wrong font-filename?

It seems not working in production:
https://zh.wikipedia.org/wiki/2016%E5%B9%B4%E5%A4%AA%E5%B9%B3%E6%B4%8B%E9%A2%B1%E9%A2%A8%E5%AD%A3#.E9.A2.A8.E6.9A.B4.E6.99.82.E9.96.93.E8.A1.A8

I have seem that.

timeline/font system

T84777 made the webservers to include the Debian packages that provide fonts (In puppet: include ::mediawiki::packages::fonts). The fonts-wqy-zenheipackage is indeed installed and provide the font:

$ dpkg -L fonts-wqy-zenhei
...
/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc
...

However that font is NOT going to be used by EasyTimeline!

EasyTimeline generates the image using a software called ploticus. The extension pass to the command a font name using the -f option. That is set in MediaWiki config using:

wmf-config/CommonSettings.php
if ( $wmgUseTimeline ) {
  wfLoadExtension( 'timeline' );
  if ( $lang = 'zh' ) {
    $wgTimelineFontFile = 'unifont-5.1.20080907.ttf';
  }

For the zh language we end up with the following call chain (summarized):

EasyTimeline.pl -f $wgTimelineFontFile
ploticus --font $wgTimelineFontFile

Ploticus look up the given file name in the path set via GDFONTPATH which is defined in our configuration has:

wmf-config/CommonSettings.php
putenv( "GDFONTPATH=/srv/mediawiki/fonts" );

The fonts are thus copied in our /fonts directory and fonts installed via Debian packages are never used!

Font selection

As part of this task, the font for zh language has been set from wqy-zenhei.ttcto unifont-5.1.20080907.ttf in June 2014:

(and see [[Kochi font]] for those Japanese fonts)
I'm submitting a patch to use unifont-5.1.20080907.ttf. This seems like a language-neutural font.

        } elseif ( $lang == 'zh' ) {
-               $wgTimelineSettings->fontFile = 'wqy-zenhei.ttc';
+               $wgTimelineSettings->fontFile = 'unifont-5.1.20080907.ttf';
        }

Potential fix

Debian has a unifont package https://packages.debian.org/jessie/unifont with the description:

font with a glyph for each visible Unicode Plane 0 character
GNU Unifont was designed to render something besides an empty box for each visible Unicode character in the Basic Multilingual Plane (Plane 0). Plane 0 contains most of the world's modern writing scripts. This font looks best at 12pt.

I guess ploticus/gd need a ttf version of it and Debian has such a package https://packages.debian.org/jessie/ttf-unifont . The .ttf files provided are:

/usr/share/fonts/truetype/unifont/unifont.ttf
/usr/share/fonts/truetype/unifont/unifont_csur.ttf
/usr/share/fonts/truetype/unifont/unifont_sample.ttf
/usr/share/fonts/truetype/unifont/unifont_upper.ttf
/usr/share/fonts/truetype/unifont/unifont_upper_csur.ttf

Which are described in the README as:

unifont: Basic Multilingual Plane (BMP, Plane 0)

unifont_csur: Fonts containing Plane 0 Unifont glyphs plus glyphs for Michael Everson's ConScript Unicode Registry (CSUR) for the Plane 0 Private Use Area

unifont_upper: Fonts containing glyphs from Unicode Plane 1 through Plane 14, inclusive.

I am not sure whether glyphs are in Plane 0 or another plane such as Plane 2 ''Supplementary Ideographic Plane'' (CJK characters).

Maybe we will have to create our own TrueType font that combines both Plane 0 and Plane 2 if that is at all possible.

Once we get a font that is known to work for zh use case. We can put it on the Wikimedia servers under /srv/mediawiki-staging/fonts/. Then patch the config to point $wgTimelineFontFile to it. Eventually nuke files that got previously generated by bumping $wgTimelineEpochTimestamp = '20130601000000';.

I have pick the first Chinese ideogram from the https://zh.wikipedia.org/wiki/2016%E5%B9%B4%E5%A4%AA%E5%B9%B3%E6%B4%8B%E9%A2%B1%E9%A2%A8%E5%AD%A3#.E9.A2.A8.E6.9A.B4.E6.99.82.E9.96.93.E8.A1.A8

熱 U+71B1 https://codepoints.net/U+71B1 claims that it is in the BMP (Plane 0) and block CJK Unified Ideographs.

I created a more basic use case that does not even have any unicode characters: https://zh.wikipedia.org/w/index.php?title=User:Hashar/T22285

<timeline>
ImageSize  = width:900 height:300
PlotArea   = top:10 bottom:60 right:20 left:20
AlignBars  = early
DateFormat = dd/mm/yyyy
Period     = from:01/01/2016 till:31/12/2016
TimeAxis   = orientation:horizontal
ScaleMinor = grid:black unit:month increment:1 start:01/01/2016

BarData =
  barset:Hurricane
PlotData=
  barset:Hurricane width:10 align:left
  from:26/05/2016 till:27/05/2016 text:"ae"
  from:03/07/2016 till:11/07/2016 text:"eeeb"
  from:17/07/2016 till:17/07/2016 text:"c"
</timeline>

That yields on the server side:

Nov 14 15:39:38 mw1208:  Could not find/open font (unifont-5.1.20080907)
Nov 14 15:39:38 mw1208:  Could not find/open font (unifont-5.1.20080907)
Nov 14 15:39:38 mw1208:  Could not find/open font (unifont-5.1.20080907)
Nov 14 15:39:38 mw1208:  Could not find/open font (unifont-5.1.20080907) (width calc)
Nov 14 15:39:38 mw1208:  Could not find/open font (unifont-5.1.20080907) (width calc)
Nov 14 15:39:38 mw1208:  Could not find/open font (unifont-5.1.20080907) (width calc)
Nov 14 15:39:38 mw1208:  Could not find/open font (unifont-5.1.20080907) (width calc)
Nov 14 15:39:38 mw1208:  Could not find/open font (unifont-5.1.20080907) (width calc)
Nov 14 15:39:38 mw1208:  Could not find/open font (unifont-5.1.20080907) (width calc)
Nov 14 15:39:38 mw1275:  Could not find/open font (unifont-5.1.20080907)
Nov 14 15:39:38 mw1275:  Could not find/open font (unifont-5.1.20080907)
Nov 14 15:39:38 mw1275:  Could not find/open font (unifont-5.1.20080907)
Nov 14 15:39:38 mw1275:  Could not find/open font (unifont-5.1.20080907) (width calc)
Nov 14 15:39:38 mw1275:  Could not find/open font (unifont-5.1.20080907) (width calc)
Nov 14 15:39:38 mw1275:  Could not find/open font (unifont-5.1.20080907) (width calc)
Nov 14 15:39:38 mw1275:  Could not find/open font (unifont-5.1.20080907) (width calc)
Nov 14 15:39:38 mw1275:  Could not find/open font (unifont-5.1.20080907) (width calc)
Nov 14 15:39:38 mw1275:  Could not find/open font (unifont-5.1.20080907) (width calc)

Not sure why it is rendered on two servers. Pretty sure the font name comes from our zhwiki setting $wgTimelineFontFile = 'unifont-5.1.20080907.ttf';.

The file is definitely on those servers as /srv/mediawiki/fonts/unifont-5.1.20080907.ttf.

I reproduced it on my machine with ploticus version 2.42. The code does strip the '.ttf' suffix

fontpath = getenv( "GDFONTPATH" );
if( fontpath == NULL ) Eerr( 12358, "warning: environment var GDFONTPATH not found. See ploticus fonts docs.", "" );
if( strcmp( &s[ strlen(s) - 4 ], ".ttf" )==0 ) s[ strlen( s)-4 ] = '\0'; /* strip off .ttf ending - scg 1/26/05 */
strcpy( GFTfont, s );

So I guess we can rename/symlink our font file and adjust the mediawiki config for consistency.

- $wgTimelineFontFile = 'unifont-5.1.20080907.ttf';
+ $wgTimelineFontFile = 'unifont-5.1.20080907';

On our production test server I did a symlink:

$ cd /srv/mediawiki/fonts
$ ln -s unifont-5.1.20080907.ttf unifont-5.1.20080907

My test case https://zh.wikipedia.org/w/index.php?title=User:Hashar/T22285 suddenly rendered some text.

Dzahn awarded a token.Nov 14 2016, 4:53 PM

Made it a blocker to T147481 so we can get the symlink/font rename properly tracked for later.

Change 321493 had a related patch set uploaded (by Hashar):
Move EasyTimeline config to its own file

https://gerrit.wikimedia.org/r/321493

hashar claimed this task.Nov 14 2016, 9:34 PM

Lets say I am de facto working on it. Will get the font file renamed and see what happens. Then probably delete the whole cache of EasyTimeline files (by changing the Epoch).

Change 321558 had a related patch set uploaded (by Hashar):
Test for $wgTimelineFontFile values

https://gerrit.wikimedia.org/r/321558

Change 321560 had a related patch set uploaded (by Hashar):
Symlink fonts for ploticus

https://gerrit.wikimedia.org/r/321560

Change 321561 had a related patch set uploaded (by Hashar):
Drop '.ttf' from $wgTimelineFontFile settings

https://gerrit.wikimedia.org/r/321561

I think I have all the patchset more or less figured out. I even wrote a small test to ensure that $wgTimelineFontFile is not set to a file with a .ttf suffix and that the file exists in /fonts/.

Will complete tomorrow then we can schedule it for testing / deployment.

+ EasyTimeline quite helpful to have this task on the project board.

hashar moved this task from Backlog to font related on the EasyTimeline board.Nov 15 2016, 10:17 AM
Cwek added a comment.Nov 17 2016, 6:08 AM
This comment was removed by Cwek.
Cwek awarded a token.Nov 17 2016, 7:46 AM

It is not deployed yet. The few patches I have wrote need to be polished/reviewed and can then be deployed. You ca n find all of them via https://gerrit.wikimedia.org/r/#/q/bug:22825

Cwek added a comment.Nov 18 2016, 3:08 AM

Do you have tested the CJK case? This is the important point need to fix.

Change 321560 merged by Hashar:
Symlink fonts for ploticus

https://gerrit.wikimedia.org/r/321560

Change 321493 merged by jenkins-bot:
Move EasyTimeline config to its own file

https://gerrit.wikimedia.org/r/321493

Mentioned in SAL (#wikimedia-operations) [2016-12-05T14:21:54Z] <hashar@tin> Synchronized wmf-config/timeline.php: Move EasyTimeline config to its own file - T22825 (duration: 00m 44s)

Mentioned in SAL (#wikimedia-operations) [2016-12-05T14:23:24Z] <hashar@tin> Synchronized wmf-config/CommonSettings.php: Move EasyTimeline config to its own file - T22825 (duration: 00m 44s)

Change 321561 merged by jenkins-bot:
Drop '.ttf' from $wgTimelineFontFile bump epoch

https://gerrit.wikimedia.org/r/321561

Mentioned in SAL (#wikimedia-operations) [2016-12-05T14:26:15Z] <hashar@tin> Synchronized fonts: For T22825 (duration: 00m 47s)

Mentioned in SAL (#wikimedia-operations) [2016-12-05T14:30:06Z] <hashar@tin> Synchronized wmf-config/timeline.php: Drop ttf from $wgTimelineFontFile and bump epoch - T22825 (duration: 00m 47s)

Change 321558 merged by jenkins-bot:
Test for $wgTimelineFontFile values

https://gerrit.wikimedia.org/r/321558

hashar added a comment.Dec 5 2016, 2:33 PM

Should be good now based on a dummy test pages on https://zh.wikipedia.org/wiki/User:Hashar/T22285 . Note that the pages will need to be purged though (via action=purge).

Note that I really discourage using EasyTimeline we should aim at migrating to the Graph extension https://www.mediawiki.org/wiki/Extension:Graph

Should be good now based on a dummy test pages on https://zh.wikipedia.org/wiki/User:Hashar/T22285 . Note that the pages will need to be purged though (via action=purge).
Note that I really discourage using EasyTimeline we should aim at migrating to the Graph extension https://www.mediawiki.org/wiki/Extension:Graph

The Axis text and Legend overlaps.
https://zh.wikipedia.org/wiki/2016%E5%B9%B4%E5%A4%AA%E5%B9%B3%E6%B4%8B%E9%A2%B1%E9%A2%A8%E5%AD%A3#.E9.A2.A8.E6.9A.B4.E6.99.82.E9.96.93.E8.A1.A8

hashar added a comment.Dec 5 2016, 4:12 PM

@Roytam1 looking at the code, the months are in a bar that is shifted down by 20 pixels:
bar:Month width:5 align:center fontsize:S shift:(0,-20) anchor:middle color:canvas

If you remove the shift:(0,-20) the months are listed up the line and no more overlap the legend not ideal.

The legend position is hard fixed with:

Legend     = columns:2 left:30 top:59 columnwidth:270

Changing top:59 to top:39 move the legend down by 20 pixels and you can keep the month bare to shifted down by 20 pixels. Tried it and it works fine.

One sure thing, it is unrelated to this task :}

TTO closed this task as Resolved.Dec 16 2016, 9:15 AM
TTO added a subscriber: TTO.

One sure thing, it is unrelated to this task :}

Not necessarily; the new font seems to have slightly different metrics. It doesn't seem like a big problem though, so I'm marking this "resolved".

Shizhao moved this task from Backlog to Closed on the Chinese-Sites board.Dec 27 2016, 3:12 AM
Liuxinyu970226 moved this task from Backlog to Closed on the Chinese-Sites board.Feb 18 2017, 3:26 AM
Yurik added a subscriber: Yurik.Mar 30 2018, 3:23 AM

Crosslinking with T156191 and T127683 - might be related, or addressed in the same way?

Dzahn removed a subscriber: Dzahn.Mar 30 2018, 3:37 PM