Page MenuHomePhabricator

PDF export extension fails to render Arabic characters in monospace font
Closed, ResolvedPublic

Assigned To
None
Authored By
Yamaha5
Nov 9 2011, 3:53 PM
Referenced Files
F8360: Sans.zip
Nov 21 2014, 11:58 PM
F8359: Mono.zip
Nov 21 2014, 11:58 PM
F8361: Serif.zip
Nov 21 2014, 11:58 PM

Description

PDF export extension has problem with <cod></cod> when we have Unicode characters
http://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Reza1615/pdf3#problem_with_unicode

also when we have space in first of Unicode text line


Version: unspecified
Severity: normal
See Also:
T30206: PDF generation does not support Complex Script Wikis (e.g. Indic languages) and needs to be re-written

Details

Reference
bz32317
TitleReferenceAuthorSource BranchDest Branch
evaluate: Don't log the JSON object, it's too big for prodrepos/abstract-wiki/wikifunctions/function-orchestrator!56jforresterT343176main
Customize query in GitLab

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
DeclinedNone
ResolvedKrinkle
Resolvedfgiunchedi
Resolvedakosiaris
Resolvedhashar
Resolvedakosiaris
ResolvedAndrew
ResolvedJoe
Resolvedtstarling
ResolvedJoe
ResolvedJoe
ResolvedJoe
ResolvedJoe
ResolvedDzahn
ResolvedJoe
Duplicatefgiunchedi
Resolved brion
Resolved brion
Resolvedbd808
ResolvedJoe
DeclinedArielGlenn
ResolvedArielGlenn
Resolvedori
DeclinedNone
ResolvedMoritzMuehlenhoff

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:58 PM
bzimport added projects: Collection, I18n.
bzimport set Reference to bz32317.
bzimport added a subscriber: Unknown Object (MLST).

What specific "Unicode" issue are you talking about? All the text is Unicode, and all the characters in the title and the "direction" section appear offhand to render.

Are the characters in the "problem with unicode" section different? Are they Farsi-specific letters perhaps? (bug 30326)

The section with space at start will be showing as a preformatted text section, which it appears to correctly set off but is laying out left-justified; on the web however these appear to start as right-aligned.

(I'm not sure if right-aligned ever really makes sense for preformatted text sections though; is that something that gets used in RTL scripts? They're not generally fixed-width...)

1-I mean None Latin Characters (Arabic) characters when they are inside <code> or they have space at the first of line they will render in rectangle shape at the normal case they are rendered correctly.

I added Latin characters to have better comparison.

2-as you said The section with space at start have to be Right-justified.

Ah that makes sense -- <code> would typically trigger a fixed-width (monospaced) font, as would the preformatted sections (starting with space).

Probably the PDFs are rendering with a monospaced font that doesn't include Arabic characters and doesn't fall back to other fonts.

Thanks Brion, Now I can understand why it has problem also in first page (cover of book that uses higher height font).

as I understood for each language we needs more that 6 fonts (4 fonts for Normal,Bold,Italic,Bold-Italic) 2 fonts(monospaced,High Height Font).

To have better result in general for languages that they didn't define correct font,In my opinion it is better to define Normal-Font as default-Font for each cases doesn't have font definition.

volker.haas wrote:

(In reply to comment #4)

Thanks Brion, Now I can understand why it has problem also in first page (cover
of book that uses higher height font).

That problem was unrelated. I fixed it with https://github.com/pediapress/mwlib.rl/commit/4adfadd716af1e04f0631883a7dc8569a2294c09

as I understood for each language we needs more that 6 fonts (4 fonts for
Normal,Bold,Italic,Bold-Italic) 2 fonts(monospaced,High Height Font).

Currently font switching isn't done for monospaced fonts. Since we use GNU Freefont arabic glyphs can't be displayed: http://www.gnu.org/s/freefont/coverage.html

The easiest way to fix the monospace problem for arabic would probably be to add it to GNU Freefont.

To have better result in general for languages that they didn't define correct
font,In my opinion it is better to define Normal-Font as default-Font for each
cases doesn't have font definition.

Created attachment 9457
GNU Fonts with arabic characters

I uploaded GNU FreeFont with Arabic characters. I hope it will be useful.
I imported
Nazli to mono (normal,Bold,Italic,BoldItalic)
Roya to Sans (normal,Bold,Italic,BoldItalic)
Roya to Serif (normal,Bold,Italic,BoldItalic)
Nazli and Roya both of the are under GNU LICENSE
http://www.farsiweb.ir/wiki/Persian_fonts

Attached:

Created attachment 9458
Sans

Attached:

Created attachment 9459
serif

Attached:

if that fonts are not useful in https://github.com/pediapress/mwlib.rl/commit/4adfadd716af1e04f0631883a7dc8569a2294c09 is it possible to add

pdfstyles.mono_font = arabic_font

volker.haas wrote:

I tried the above hack, but it doesn't help: the glyphs will be visible but the directionality remains broken.

for farsi and other languages that they have this bug is it possible to switch them to fonts that they use in <source>?
example
http://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Reza1615/pdf4

Currently font switching isn't done for monospaced fonts. Since we use GNU
Freefont arabic glyphs can't be displayed:
http://www.gnu.org/s/freefont/coverage.html

The easiest way to fix the monospace problem for arabic would probably be to
add it to GNU fonts

Please update font now it supports arabic for monospace

(In reply to comment #12)

Please update font now it supports arabic for monospace

reza1615: Comment 10 implied that there are problems with directionality. Could you elaborate who this is related (if it is)?

(In reply to comment #13)

(In reply to comment #12)

Please update font now it supports arabic for monospace

reza1615: Comment 10 implied that there are problems with directionality.
Could
you elaborate who this is related (if it is)?

It was for Comment 9 which was hack according to Comment 5 it should work because Now freefont supports Arabic and Farsi.

Please update monospace Freefont. Now it supports Arabic font and this bug will solve! I asked this request many times in IRC and personal emails but no one care :(

(In reply to comment #15)

Please update monospace Freefont.

Please provide information about the "update", at least a link to download and ChangeLog.

here is the main_page
http://savannah.gnu.org/projects/freefont/

here is the coverage
http://www.gnu.org/software/freefont/coverage.html

here is the last release links (2012-05-03)
http://ftp.gnu.org/gnu/freefont/

It should update on the server which runs collection extension

ralf_wikimedia wrote:

the pdf servers are going to be decomissioned in a few days and the pdf rendering software will be replaced.

It's not going to happen.

We can still get the new PDF servers to include the updated font. The GNU Freefont is most probably installed using Ubuntu package ttf-freefont which is from 2010:

$ apt-cache policy ttf-freefont
ttf-freefont:

Installed: (none)
Candidate: 20100919-1
Version table:
   20100919-1 0
      500 http://ubuntu.wikimedia.org/ubuntu/ precise/main amd64 Packages

$

Ubuntu updated the package to 20120503 in their version Quantal. We could backport it to Precise and thus update the fonts. Not sure whether the free font version 20120503 supports Arabic though :(

Not sure whether the free
font version 20120503 supports Arabic though :(

http://www.gnu.org/software/freefont/coverage.html

says that Freefonts supports 212 Mono Arabic (character ranges)

Comment 19 is about Ubuntu packages from Ubuntu, hence gnu.org does not matter in this case.

The page at http://www.gnu.org/software/freefont/coverage.html list Arabic has being supported and has been generated on 02:14:54 PM 04/29/2012 CEST.

So I guess we can get the package backported.

reza1615: Please do state explicitly what you have retested a bug report and if the new OCG setup works for you and does not expose the problem anymore.
"Thanks for new version" is a bit too interpretable and reading this comment in one year, people won't understand what "new version" you meant. Thanks :)