Page MenuHomePhabricator

The PDF output for Persian is using Urdu fonts
Closed, ResolvedPublic

Assigned To
Authored By
Ladsgroup
Sep 15 2020, 7:21 AM
Referenced Files
F32362382: image.png
Sep 24 2020, 5:33 PM
F32362004: image.png
Sep 24 2020, 11:37 AM
F32361982: image.png
Sep 24 2020, 11:37 AM
F32361976: image.png
Sep 24 2020, 11:37 AM
F32353524: ابوالحسن_ورزی.pdf
Sep 17 2020, 10:55 AM
F32353522: گرنت_کاردونه.pdf
Sep 17 2020, 10:47 AM
F32353115: ابوالحسن ورزی - ویکی_گفتاورد.pdf
Sep 17 2020, 1:41 AM
F32353109: ابوالحسن_ورزی.pdf
Sep 17 2020, 1:25 AM

Description

This is output of PDF:

image.png (882×1 px, 592 KB)

This is the actual article:

image.png (361×1 px, 86 KB)

I know it's hard to explain what's wrong but this font can't be more wrong for output of Persian, this is calligraphy/stylish font for Persian and barely readable (not to mention we are being mocked because of this)

Event Timeline

yes but this doesn't require installing new font, just use something else, anything else, like DejavuSans (works perfectly fine for Persian) or any other font but not this.

This could be related to a known issue in puppeteer/chromium -> see https://github.com/puppeteer/puppeteer/issues/2410

This could be related to a known issue in puppeteer/chromium -> see https://github.com/puppeteer/puppeteer/issues/2410

As a native speaker, I highly doubt it. It's not just spacing in the text that's off, the text is pretty thick (bold-like) and typeface in general looks different (some characters have a totally different look e.g. there are multiple ways to show a digit like 4 can be ۴ or ٤).

@Ladsgroup I have tried different browsers (Chrome, Firefox, Safari) on different OSs, all in logged-out mode, and cannot replicate this. Would you mind updating the Task description to specify OS and Browser, and also trying other OS/Browser combos?

Also, @Ladsgroup if you can upload the PDF you got as well it would be better to compare with. This is my output from the PDF renderer service

Yes, it's because of the changes in MediaWiki:Print.css of Persian Wikipedia (done by @Ebraminio to mitigate the issue).

If you go and try it on a Persian wiki with empty Print.css (like Persian Wikiquote), you can see it. Try it on https://fa.wikiquote.org/wiki/%D8%A7%D8%A8%D9%88%D8%A7%D9%84%D8%AD%D8%B3%D9%86_%D9%88%D8%B1%D8%B2%DB%8C

Not true. I just tried the Persian Wikiquote example you provided with Chrome 85.0 on Windows and the PDF file generated by Chrome's Save to PDF (in Print) menu looks different that yours and uses the correct font.

Once more, I encourage you to update the task description with OS and Browser info.

I have the issue on Persian Wikiquote Android 9 and Chrome 85


Not true. I just tried the Persian Wikiquote example you provided with Chrome 85.0 on Windows and the PDF file generated by Chrome's Save to PDF (in Print) menu looks different that yours and uses the correct font.

Once more, I encourage you to update the task description with OS and Browser info.

It is intresting your pdf has lower size and its name has ویکی‌گفتاورد
why the extention acts differently for different OS

Oh wait! You are talking about the PDF generator extension! I was talking about browser's own print to PDF functionality.
I can confirm the extension generates the PDF with the wrong font, no matter OS or browser.

After a bit of triaging here are my findings about the font rendering:

Page rendered on browser is using DejaVu Sans but with an accepted CSS font-family

".Arabic UI Text", Tahoma, "Iranian Sans", "Noto Sans Arabic", "DejaVu Sans", sans-serif

Page rendered on PDF (using puppeteer) is using the following

 $ strings rendered.pdf | grep FontName
/FontName /NotoNaskhArabic-Bold
/FontName /NotoSansArabic-Bold
/FontName /LiberationSerif
/FontName /NotoNaskhArabic
/FontName /NotoSansCJKjp-Regular
/FontName /LiberationSerif-Bold
/FontName /NotoSansCJKjp-Regular
/FontName /NotoSansArabic-Regular

NotoNaskhArabic and NotoSansArabic are okay to use but maybe it's using "bold" everywhere?

It looks like on mobile, PDF rendering service works fine. Here is the list of fonts used:

/FontName /DejaVuSans
/FontName /LiberationSerif-Italic
/FontName /DejaVuSans
/FontName /LiberationSans
/FontName /LiberationSans-Bold
/FontName /NotoSansCJKjp-Regular
/FontName /DejaVuSans
/FontName /DejaVuSans
/FontName /NotoSansCJKjp-Regular
/FontName /DejaVuSans
/FontName /DejaVuSans
/FontName /LiberationSans-Italic
/FontName /DejaVuSans
/FontName /DejaVuSans
/FontName /DejaVuSans
/FontName /LiberationSerif
/FontName /DejaVuSans
/FontName /DejaVuSans

DejaVuSans is good, It's basically the default font when you read fawiki on Linux.

Essentially what chromium-render does is, via puppeteer, it spawns a chromium instance and orchestrates a print as PDF export of the page. It relies on the print.css for the rendering.

NotoNaskhArabic and NotoSansArabic are okay to use but maybe it's using "bold" everywhere?

I am a bit confused. If NotoNaskhArabic and NotoSansArabic are OK to use whats the issue with the rendering?
Here are some screenshots with the bold font usage (highlighted).

image.png (1×3 px, 717 KB)

image.png (1×3 px, 727 KB)

And here is the PDF export (using the extensions)

image.png (1×2 px, 467 KB)

Change 629664 had a related patch set uploaded (by Jgiannelos; owner: Jgiannelos):
[mediawiki/services/chromium-render@master] Add DejaVu fonts for PDF rendering

https://gerrit.wikimedia.org/r/629664

It looks like the container image we are using in production doesn't include DejaVu fonts. So it falls back to:

/FontName /NotoNastaliqUrdu-Bold
/FontName /NotoSerif-Bold
/FontName /NotoNastaliqUrdu-Bold
/FontName /NotoSans-Regular
/FontName /NotoSans-Bold
/FontName /NotoNastaliqUrdu-Bold
/FontName /LiberationSans
/FontName /LiberationSans-Italic
/FontName /LiberationSans-Bold
/FontName /NotoNastaliqUrdu-Bold

which I guess is wrong.

Change 629664 merged by jenkins-bot:
[mediawiki/services/chromium-render@master] Add DejaVu fonts for PDF rendering

https://gerrit.wikimedia.org/r/629664

Change 629748 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[operations/deployment-charts@master] Update chromium-render to 2020-09-24-145544-production

https://gerrit.wikimedia.org/r/629748

Change 629748 merged by jenkins-bot:
[operations/deployment-charts@master] Update chromium-render to 2020-09-24-145544-production

https://gerrit.wikimedia.org/r/629748

If you go and try it on a Persian wiki with empty Print.css (like Persian Wikiquote), you can see it. Try it on https://fa.wikiquote.org/wiki/%D8%A7%D8%A8%D9%88%D8%A7%D9%84%D8%AD%D8%B3%D9%86_%D9%88%D8%B1%D8%B2%DB%8C

Change is now in prod. With an empty print CSS it looks like it defaults to DejaVu Sans

image.png (1×2 px, 314 KB)