Page MenuHomePhabricator

[Spike, 8hrs] Grave kerning issues and spacing issues in PDFs generated by Chromium (and previous Electron) via "Download as PDF"
Closed, ResolvedPublic0 Estimated Story Points

Authored By
TheDJ
Oct 20 2017, 9:27 AM
Referenced Files
F23397162: Book_Proton_Native.pdf
Jul 6 2018, 9:41 PM
F23397474: Book_Browser_toPDF_A4.pdf
Jul 6 2018, 9:41 PM
F23397097: Book_Proton_None_Medium.pdf
Jul 6 2018, 9:41 PM
F23397130: Book_Proton_AutoHinter.pdf
Jul 6 2018, 9:41 PM
F23396973: Book_Local_Native.pdf
Jul 6 2018, 9:41 PM
F22225934: Screen Shot 2018-06-14 at 13.53.00.png
Jun 14 2018, 11:53 AM
F22142827: Barack Obama.pdf
Jun 12 2018, 11:59 AM
F22142349: image.png
Jun 12 2018, 11:16 AM
Tokens
"Heartbreak" token, awarded by Jdlrobson."Heartbreak" token, awarded by Nemo_bis.

Description

Background

When visiting https://en.wikipedia.org/wiki/Songs_About_Jane and clicking "Download as PDF" the resulting document shows kerning issues. These do not show when you print to PDF using the browser.

https://www.mediawiki.org/wiki/Topic:U08at90ido2loj5q

There is extremely weird spacing and kerning for me; maybe this is because the documents aren't actually supposed to be rendered with the Liberation fonts. References being floated on top of text, characters being given zero(?) width, and so on. I am using macOS 10.13 but downloaded the Liberation fonts separately.

Screen Shot 2017-10-19 at 12.48.18.png (608×1 px, 253 KB)

I can reproduce these problems in most PDF downloads (at least when viewing on my mac). Possibly a font style, font or glyph fallback related problem ? I do not remember seeing such problems with the PDF before the new print styles were released (but not 100% sure, so tagging under both).

Question: Is this an electron-only bug? Is this also a bug in Chromium?

When investigating this we were only able to replicate this on Electron. The conclusion is that this should be remedied when we launch the new Chromium based service. When launching we will take some time to verify this is fixed along with other bugs in the Proton project. We should be wary of different font stacks on the production machine.

Chromium

We have a identified that this is also an issue with the Chromium service and would like to know the following:

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ovasileva renamed this task from Grave kerning issues and spacing issues in PDFs generated by Electron via "Download as PDF" to [Spike 3hrs] Grave kerning issues and spacing issues in PDFs generated by Electron via "Download as PDF".Nov 7 2017, 5:27 PM

estimating as a 3 hour spike and bringing into the sprint so we can do investigation prior to settling a solution for next sprint.

Have we actually checked this is an Electron problem ? Because I don't remember seeing this in any of the Electron pdfs before we did the print style changes, so my suspicion is that moving to Chromium with the same font stack etc, will likely just show the same problem.

I can replicate this via the download to pdf link in the left menu, but I cannot replicate it when printing directly to PDF from my browser (google chrome on OSX). I'm not sure how I can investigate this any further as we cannot easily debug Electron PDF.

OS: Linux (Ubuntu 17.10), Chromium 61
I can replicate that via "download to pdf", I cannot replicate when printing to PDF.

I would say it's Electron problem as I opened the result PDF in 4 readers (Chromium/Gimp/Some online web editor/xournal) and all readers show content with bad spacing.
When I select the [4] element the whole block is clearly off.
When I try to edit ... band signed with Octone Records, a New ... after couple seconds it fixes the , position.

Font's used in pdf generated by Electron service:

raynor@DellE6540:~/Downloads » pdffonts Songs_About_Jane.pdf 
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
LiberationSans-BoldItalic            CID TrueType      Identity-H       yes no  yes     27  0
LiberationSans-Bold                  CID TrueType      Identity-H       yes no  yes     32  0
LiberationSans                       CID TrueType      Identity-H       yes no  yes     37  0
LiberationSans-Italic                CID TrueType      Identity-H       yes no  yes     42  0
LiberationSerif-BoldItalic           CID TrueType      Identity-H       yes no  yes     47  0
LiberationSerif                      CID TrueType      Identity-H       yes no  yes     52  0
LiberationSerif-Italic               CID TrueType      Identity-H       yes no  yes     57  0
LiberationSerif-Bold                 CID TrueType      Identity-H       yes no  yes     62  0
LiberationMono                       CID TrueType      Identity-H       yes no  yes     77  0
WenQuanYiZenHei                      CID TrueType      Identity-H       yes no  yes     83  0

Text ... in every song about her is using LiberationSerif, and [4] is in LiberationSans.
The wrong position of , is still a question to me

Given the new information in https://phabricator.wikimedia.org/T178028#3742464 that this is only via "Download as PDF" I'm a bit puzzled why we are working on this bug given Electron is not going to be rendering the final output. Shouldn't this be blocked until the new service is up and running, given @Nirzar has already made a request for installing a specific font stack?

... given @Nirzar has already made a request for installing a specific font stack?

Sorry if this a stupid question but is this a separate task?

PDF generated by T176627 doesn't have those issues.

@pmiazga: For clarity, were you running the service on your local machine or on a server?

@phuedx local machine

raynor@DellE6540:~/Downloads » pdffonts Songs_About_Jane_electron.pdf 
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
LiberationSans-BoldItalic            CID TrueType      Identity-H       yes no  yes     32  0
LiberationSans-Bold                  CID TrueType      Identity-H       yes no  yes     37  0
LiberationSans                       CID TrueType      Identity-H       yes no  yes     42  0
LiberationSans-Italic                CID TrueType      Identity-H       yes no  yes     47  0
LiberationSerif-BoldItalic           CID TrueType      Identity-H       yes no  yes     52  0
LiberationSerif                      CID TrueType      Identity-H       yes no  yes     57  0
LiberationSerif-Italic               CID TrueType      Identity-H       yes no  yes     62  0
LiberationSerif-Bold                 CID TrueType      Identity-H       yes no  yes     67  0
LiberationMono                       CID TrueType      Identity-H       yes no  yes     83  0
[none]                               Type 3            Custom           yes no  yes     89  0
[none]                               Type 3            Custom           yes no  yes     94  0

@pmiazga is the css font stack just "serif" or do we specific any typefaces there?

a solution would be to install Charis[1] on server and specify that in css instead of just serif.

[1] @phuedx here's that separate task we had created https://phabricator.wikimedia.org/T169828

Per T181200#3795834, Charter is now the default typeface used by the Electron-based PDF renderer.

Unclear whether we resolved this given conclusion "When investigating this we were only able to replicate this on Electron. The conclusion is that this should be remedied when we launch the new Chromium based service. When launching we will take some time to verify this is fixed along with other bugs in the Proton project. We should be wary of different font stacks on the production machine."

Should this be stalled and the description updated?

So what's the status on this ? Since the whole chromium thing seems to have been shut down, does that mean this will not get fixed ?

ovasileva changed the task status from Open to Stalled.May 11 2018, 1:56 PM

@TheDJ - we're still replacing electron with chromium - it took us a while to get a server for it T187821: Choose a server for the chromium-render service, but we're hoping to deploy it within the next month. We're keeping this open as a reminder to double-check that kerning issues are not present in chromium as well.

I did some testing on this on Beta and here are the results:

Browser PrintDownload PDF
Screen Shot 2018-06-01 at 8.38.34 AM.png (1×2 px, 953 KB)
Screen Shot 2018-06-01 at 8.38.38 AM.png (1×2 px, 1 MB)

The kerning issue is present on the PDF

@ABorbaWMF - just to double-check. For the Chromium PDF - did you generate that one using the instructions @pmiazga provided in T195991?

@ovasileva - I did not initially, sorry about that. Here is another comparison using the correct method. Kerning has improved, but still looks a little off on bolded text and some of the link text as well.

Browser PrintPDF
Screen Shot 2018-06-01 at 9.30.07 AM.png (1×2 px, 1 MB)
Screen Shot 2018-06-01 at 9.30.28 AM.png (1×2 px, 1 MB)

Kerning issues look similar across desktop style PDFs. The kerning issues look less severe on mobile styles

Desktop LetterDesktop A4Desktop Legal
Screen Shot 2018-06-05 at 11.05.53 AM.png (1×2 px, 1 MB)
Screen Shot 2018-06-05 at 11.05.59 AM.png (1×2 px, 1 MB)
Screen Shot 2018-06-05 at 11.06.46 AM.png (1×2 px, 1 MB)
Mobile LetterMobile A4Mobile Legal
Screen Shot 2018-06-05 at 11.07.23 AM.png (1×2 px, 866 KB)
Screen Shot 2018-06-05 at 11.07.20 AM.png (1×2 px, 855 KB)
Screen Shot 2018-06-05 at 11.07.17 AM.png (1×2 px, 855 KB)

https://proton-beta.wmflabs.org/en.wikipedia.org/v1/pdf/Barack Obama/letter/desktop

https://proton-beta.wmflabs.org/en.wikipedia.org/v1/pdf/Barack Obama/a4/desktop

https://proton-beta.wmflabs.org/en.wikipedia.org/v1/pdf/Barack Obama/legal/desktop

https://proton-beta.wmflabs.org/en.wikipedia.org/v1/pdf/Barack Obama/letter/mobile

https://proton-beta.wmflabs.org/en.wikipedia.org/v1/pdf/Barack Obama/a4/mobile

https://proton-beta.wmflabs.org/en.wikipedia.org/v1/pdf/Barack Obama/legal/mobile

ovasileva changed the task status from Stalled to Open.Jun 12 2018, 11:05 AM

Issues still appearing in the Chromium renderer. Bringing into sprint for continuation of the spike

@alexhollender - Might be an issue with Chromium fonts

ovasileva renamed this task from [Spike 3hrs] Grave kerning issues and spacing issues in PDFs generated by Electron via "Download as PDF" to Grave kerning issues and spacing issues in PDFs generated by Electron and Chromium via "Download as PDF".Jun 12 2018, 11:06 AM
ovasileva moved this task from Incoming to 2017-18 Q4 on the Web-Team-Backlog board.

apologies for lurking on the kerning issue (my old friend)... I stumbled across another text related bug, maybe tracked somewhere else.

if an anchor tag is wrapped, the rest of the text in that row also gets a hyperlink and is clickable. it links out to the bluelink that preceded the text.

example -

image.png (514×1 px, 114 KB)

here, clicking on well-received goes to results of primaries wiki

apologies for lurking on the kerning issue (my old friend)... I stumbled across another text related bug, maybe tracked somewhere else.

if an anchor tag is wrapped, the rest of the text in that row also gets a hyperlink and is clickable. it links out to the bluelink that preceded the text.

example -

image.png (514×1 px, 114 KB)

here, clicking on well-received goes to results of primaries wiki

@Nirzar - what article were you testing on? We should test to see if we can reproduce in Chromium as well.

@Nirzar - my bad, that is Chromium. @pmiazga - are we underlining links for all scripts?

Jdlrobson renamed this task from Grave kerning issues and spacing issues in PDFs generated by Electron and Chromium via "Download as PDF" to [Spike, 8hrs] Grave kerning issues and spacing issues in PDFs generated by Chromium (and previous Electron) via "Download as PDF".Jun 13 2018, 5:26 PM
Jdlrobson updated the task description. (Show Details)
Jdlrobson set the point value for this task to 0.

@pmiazga will update description to reflect the questions we now want to answer and answer to the existing question.

from @ABorbaWMF 's screenshots, it looks like mobile and desktop use different fonts. Is that intentional?

Screen Shot 2018-06-14 at 13.53.00.png (680×1 px, 336 KB)

Change 442735 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/services/chromium-render@master] Try to fix fonts rendering issues (bad kerning)

https://gerrit.wikimedia.org/r/442735

Change 442735 merged by jenkins-bot:
[mediawiki/services/chromium-render@master] Try to fix fonts rendering issues (bad kerning)

https://gerrit.wikimedia.org/r/442735

@ovasileva suggested that switching to Charter (see T181200) might get rid of this issue during today's standup ritual.

TL;DR;

In general - when running a browser in the headless mode issues related to fonts are pretty common. What makes the situation even much more difficult, errors are usually specific to given environment (not possible to reproduce it on different setup). Quick googling provides us many results with broken font's alignment on many headless systems (not only chrome, also firefox or tools like wkhtmltopdf). I tried different setups/font-sets on different systems/versions of fonts/chromium/and fontconfig package (looks like versions pre-2.12 handle font hinting in a different way). Puppeteer also has an open ticket regarding inconsistent font rendering in headless mode. Version 1.5.0 has some changes regarding the font hinting (it just looks better) but we can expect some changes in that matter in the near future.

PDF printed locally using print dialog:


PDF printed on Proton after all config changes:

More information

There are 3 possible ways of solving the badly rendered fonts issue:

  1. try to change to server configuration, but it can still be broken
  2. try to postprocess the PDFs (but again we can hit the issues related to the headless mode)
  3. try to embed fonts in the browser and ask browser to render those (instead of using system fonts)

I decided to pursue the point 1, as point 2 doesn't seem like a reliable approach (still headless mode plus it requires extra processing power). PDF generation is already heavy on resources task, there is no need to add much more post-processing and make PDF generation even slower.
After playing it for a while, currently the Proton on Beta runs on puppeteer v1.5.0, fontconfig tuning set to none, hinting to medium, bitmaps enabled, noto-fonts set installed and I think it provides the best results (please see attached PDFs for results).

Also, I noticed that we do not use Charter (I remember there was a conversation about using different fonts Use "Charter" as preferred typeface on Electron, Desktop and Mobile PDFs are rendered using LiberationSans and LiberationSerif fonts:

name                        type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
LiberationSans-Bold         CID  TrueType Identity-H yes no yes 61 0
LiberationSans              CID  TrueType Identity-H yes no yes 66 0
LiberationSerif-Bold        CID  TrueType Identity-H yes no yes 71 0
LiberationSerif-Italic      CID  TrueType Identity-H yes no yes 76 0
LiberationSerif             CID  TrueType Identity-H yes no yes 81 0
LiberationSans-Italic       CID  TrueType Identity-H yes no yes 88 0

@Nirzar, @Alex - do we want to change that and try to use Charter font?

Results:

The PDF printed using browser built in print-to-pdf option - this is the result we want to achieve:

The PDF printed locally, using proton development version, running NPM/Chromium as a user that can access Linux X system. The result PDF is almost the same as the one used by browser print dialog.

The PDF printed on beta cluster proton instance, tuning: none, hinting: medium, antialiasing on (mostly matches the local browser print mode)

The PDF printed on beta cluster proton instance, tuning: autohinting, hinting: full, antialiasing on (looks okeyish, but still has some issues)

The PDF printed on beta cluster proton instance, tuning: native (default), hinting medium, antialiasing off (still really bad)

More information, mostly tech stuff:
When browser generates the PDF, it will embed the fonts and hinting information in the PDF, it probably can be changed (as an example, I can open the badly rendered PDF in `LibreOffice Draw, change fonts, and save it to get nicely rendered PDF.
I didn't play too much with web-fonts as changing the fonts configuration in Linux gave me pretty good results.

Looks like fonts handling on X systems is a bit more complicated than I expected it to be. Each TTF font can have its own kerning table. The system can use the font kerning definitions, or ignore it and use it's own, or just ignore everything and do not adjust letter positions. Debian provides a fontconfig package, which can be used for some tuning (which to me worked in headless mode but did not work when using applications inside Xorg ecosystem - probably my Debian instance has another tool that manages fonts). When it comes to rendering, we can or provide the path to the TTF/Web font and expect the browser to handle hinting/kerning, or we can use the system font and expect the system to do its job. What makes the whole thing even more interesting, there is a possibility to edit the kerning tables for a single font (using fontutils).

fontconfig provides 3 rendering modes

  • Native (default to Debian, works the best with DejaVU/Microsoft fonts),
  • Autohinter - for any other TrueType font
  • None (no extra render support?)

and additional 4 hinting modes:

  • none
  • slight
  • medium
  • full

Full means crisp font that aligns well to the pixel grid but will lose a greater amount of font shape).

PDF printed on Proton after all config changes:

This actually looks pretty good to me. @alexhollender - handing this over to for review.