Page MenuHomePhabricator

Non-Latin/Cyrillic/Arabic Character Rendering is broken for PDF rendering
Closed, ResolvedPublicBUG REPORT

Description

Currently, when using the Download PDF feature, characters that are not Latin/Cyrillic/Arabic Character will fail to render (render as Tofu boxes) rather than the actual characters.
Expected Result: The non-Latin/Cyrillic/Arabic Characters are rendered correctly.
Actual Result: See screenshot of the PDF, non-Latin/Cyrillic/Arabic Characters are not rendered. However, Latin/Cyrillic/Arabic characters will render correctly.

截屏2020-07-12 08.46.13.jpg (1×1 px, 351 KB)

From Chinese Wikipedia
截屏2020-07-12 08.46.42.jpg (1×1 px, 444 KB)

From Korean Wikipedia
截屏2020-07-12 08.46.32.jpg (1×1 px, 473 KB)

From Japanese Wikipedia
截屏2020-07-12 08.46.48.jpg (858×1 px, 197 KB)

From Tibetan Wikipedia

Event Timeline

VulpesVulpes825 changed the subtype of this task from "Task" to "Bug Report".Jul 12 2020, 8:37 AM
VulpesVulpes825 added a project: Regression.

Hi @VulpesVulpes825, thanks for taking the time to report this! Please always include a full link to a web address where a problem can be seen. Please always read and follow https://www.mediawiki.org/wiki/How_to_report_a_bug when creating any tickets here - thanks!

@Aklapper This is happening not to one web address, but all Wikimedia projects that are not using Latin/Cyrillic/Arabic script. Currently, CJK script is absolutely broken, as well as some other script like Tibetan. The script that are not been affected can be more than I listed since I do not have time to test every single writing script.

@Aklapper This is happening not to one web address, but all Wikimedia projects that are not using Latin/Cyrillic/Arabic script. Currently, CJK script is absolutely broken, as well as some other script like Tibetan. The script that are not been affected can be more than I listed since I do not have time to test every single writing script.

@VulpesVulpes825 And I asked you for one single random example link where others could see this problem. Nobody asked to test every single script. :)

Change 612353 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/chromium-render@master] Install required font packages in the production image

https://gerrit.wikimedia.org/r/612353

Change 612353 merged by jenkins-bot:
[mediawiki/services/chromium-render@master] Blubber: Install required font packages

https://gerrit.wikimedia.org/r/612353

Currently PDF rendering on beta cluster renders CJK character correctly (Even though the render itself can be improved, see T226633). Will close the task once the patch deployed to production wikis.

This should now be resolved (but note that some recently rendered and cached PDFs may still display the issue).

The production Proton service is now using fonts-noto-cjk (edit: and fonts-noto-cjk-extra), so I believe that T226633 can also be resolved.

Newly rendered PDFs seem fine, but there is still at least one case (https://zh.wikipedia.org/w/index.php?title=Special:ElectronPdf&page=1980%E5%B9%B4%E5%9C%A3%E6%B5%B7%E4%BC%A6%E7%81%AB%E5%B1%B1%E7%88%86%E5%8F%91&action=show-download-screen) of a page with a bad PDF rendering from before the fix that seems to be cached somewhere I haven't figured out yet. I'm trying to figure out what's left to be cleaned up and how before closing the ticket.

OK, the problem was with my local cache. This is done.