Page MenuHomePhabricator

The PDF output for Persian is using Urdu fonts
Closed, ResolvedPublic

Description

This is output of PDF:

This is the actual article:

I know it's hard to explain what's wrong but this font can't be more wrong for output of Persian, this is calligraphy/stylish font for Persian and barely readable (not to mention we are being mocked because of this)

Event Timeline

Restricted Application added subscribers: Huji, Aklapper. · View Herald Transcript

yes but this doesn't require installing new font, just use something else, anything else, like DejavuSans (works perfectly fine for Persian) or any other font but not this.

LGoto triaged this task as Medium priority.Sep 16 2020, 3:35 PM
LGoto moved this task from Needs triage to Upcoming on the Product-Infrastructure-Team-Backlog board.

This could be related to a known issue in puppeteer/chromium -> see https://github.com/puppeteer/puppeteer/issues/2410

This could be related to a known issue in puppeteer/chromium -> see https://github.com/puppeteer/puppeteer/issues/2410

As a native speaker, I highly doubt it. It's not just spacing in the text that's off, the text is pretty thick (bold-like) and typeface in general looks different (some characters have a totally different look e.g. there are multiple ways to show a digit like 4 can be ۴ or ٤).

Huji added a comment.Sep 16 2020, 4:37 PM

@Ladsgroup I have tried different browsers (Chrome, Firefox, Safari) on different OSs, all in logged-out mode, and cannot replicate this. Would you mind updating the Task description to specify OS and Browser, and also trying other OS/Browser combos?

Also, @Ladsgroup if you can upload the PDF you got as well it would be better to compare with. This is my output from the PDF renderer service

Yes, it's because of the changes in MediaWiki:Print.css of Persian Wikipedia (done by @Ebraminio to mitigate the issue).

If you go and try it on a Persian wiki with empty Print.css (like Persian Wikiquote), you can see it. Try it on https://fa.wikiquote.org/wiki/%D8%A7%D8%A8%D9%88%D8%A7%D9%84%D8%AD%D8%B3%D9%86_%D9%88%D8%B1%D8%B2%DB%8C

Huji added a comment.Sep 17 2020, 1:41 AM

Not true. I just tried the Persian Wikiquote example you provided with Chrome 85.0 on Windows and the PDF file generated by Chrome's Save to PDF (in Print) menu looks different that yours and uses the correct font.

Once more, I encourage you to update the task description with OS and Browser info.

Yamaha5 added a comment.EditedSep 17 2020, 10:47 AM

I have the issue on Persian Wikiquote Android 9 and Chrome 85


Not true. I just tried the Persian Wikiquote example you provided with Chrome 85.0 on Windows and the PDF file generated by Chrome's Save to PDF (in Print) menu looks different that yours and uses the correct font.

Once more, I encourage you to update the task description with OS and Browser info.

It is intresting your pdf has lower size and its name has ویکی‌گفتاورد
why the extention acts differently for different OS

Huji added a comment.Sep 17 2020, 1:43 PM

Oh wait! You are talking about the PDF generator extension! I was talking about browser's own print to PDF functionality.
I can confirm the extension generates the PDF with the wrong font, no matter OS or browser.

Jgiannelos added a comment.EditedSep 18 2020, 1:27 PM

After a bit of triaging here are my findings about the font rendering:

Page rendered on browser is using DejaVu Sans but with an accepted CSS font-family

".Arabic UI Text", Tahoma, "Iranian Sans", "Noto Sans Arabic", "DejaVu Sans", sans-serif

Page rendered on PDF (using puppeteer) is using the following

 $ strings rendered.pdf | grep FontName
/FontName /NotoNaskhArabic-Bold
/FontName /NotoSansArabic-Bold
/FontName /LiberationSerif
/FontName /NotoNaskhArabic
/FontName /NotoSansCJKjp-Regular
/FontName /LiberationSerif-Bold
/FontName /NotoSansCJKjp-Regular
/FontName /NotoSansArabic-Regular

NotoNaskhArabic and NotoSansArabic are okay to use but maybe it's using "bold" everywhere?

It looks like on mobile, PDF rendering service works fine. Here is the list of fonts used:

/FontName /DejaVuSans
/FontName /LiberationSerif-Italic
/FontName /DejaVuSans
/FontName /LiberationSans
/FontName /LiberationSans-Bold
/FontName /NotoSansCJKjp-Regular
/FontName /DejaVuSans
/FontName /DejaVuSans
/FontName /NotoSansCJKjp-Regular
/FontName /DejaVuSans
/FontName /DejaVuSans
/FontName /LiberationSans-Italic
/FontName /DejaVuSans
/FontName /DejaVuSans
/FontName /DejaVuSans
/FontName /LiberationSerif
/FontName /DejaVuSans
/FontName /DejaVuSans

DejaVuSans is good, It's basically the default font when you read fawiki on Linux.

Jgiannelos added a comment.EditedSep 24 2020, 11:37 AM

Essentially what chromium-render does is, via puppeteer, it spawns a chromium instance and orchestrates a print as PDF export of the page. It relies on the print.css for the rendering.

NotoNaskhArabic and NotoSansArabic are okay to use but maybe it's using "bold" everywhere?

I am a bit confused. If NotoNaskhArabic and NotoSansArabic are OK to use whats the issue with the rendering?
Here are some screenshots with the bold font usage (highlighted).


And here is the PDF export (using the extensions)

Change 629664 had a related patch set uploaded (by Jgiannelos; owner: Jgiannelos):
[mediawiki/services/chromium-render@master] Add DejaVu fonts for PDF rendering

https://gerrit.wikimedia.org/r/629664

It looks like the container image we are using in production doesn't include DejaVu fonts. So it falls back to:

/FontName /NotoNastaliqUrdu-Bold
/FontName /NotoSerif-Bold
/FontName /NotoNastaliqUrdu-Bold
/FontName /NotoSans-Regular
/FontName /NotoSans-Bold
/FontName /NotoNastaliqUrdu-Bold
/FontName /LiberationSans
/FontName /LiberationSans-Italic
/FontName /LiberationSans-Bold
/FontName /NotoNastaliqUrdu-Bold

which I guess is wrong.

Change 629664 merged by jenkins-bot:
[mediawiki/services/chromium-render@master] Add DejaVu fonts for PDF rendering

https://gerrit.wikimedia.org/r/629664

Change 629748 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[operations/deployment-charts@master] Update chromium-render to 2020-09-24-145544-production

https://gerrit.wikimedia.org/r/629748

Change 629748 merged by jenkins-bot:
[operations/deployment-charts@master] Update chromium-render to 2020-09-24-145544-production

https://gerrit.wikimedia.org/r/629748

Jgiannelos added a comment.EditedSep 24 2020, 5:33 PM

If you go and try it on a Persian wiki with empty Print.css (like Persian Wikiquote), you can see it. Try it on https://fa.wikiquote.org/wiki/%D8%A7%D8%A8%D9%88%D8%A7%D9%84%D8%AD%D8%B3%D9%86_%D9%88%D8%B1%D8%B2%DB%8C

Change is now in prod. With an empty print CSS it looks like it defaults to DejaVu Sans

Thanks! It looks good now.

MSantos closed this task as Resolved.Sep 30 2020, 3:23 PM