Page MenuHomePhabricator

(Indic) Hindi and sanskrit letters rendered as squares in PDFs
Closed, DeclinedPublic

Description

Author: sushant_savla

Description:
This is regarding the error which occurs while converting Gujarati wiki Source books in to PDF format.

Many a times bookes written in Gujarati Language contain quotes written in Hindi or sanskrit.

while converting, Text written in gujarati gets proeperly converted, but the words that are written in Hindi or sanskrit language appears as square Blocks.

Example : convert this book [[સભ્ય:Sushant savla/પુસ્તકો/બુદ્ધ અને મહાવીર]] in PDF, refer Page No. 12 pf the PDF converted version; there will be Square in place of Hindi / Sanskrit text.


Version: unspecified
Severity: normal

Details

Reference
bz72005

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:51 AM
bzimport added a project: OCG-PDF-renderer.
bzimport set Reference to bz72005.
bzimport added a subscriber: Unknown Object (MLST).

sushant_savla wrote:

I am not awareas to

sushant_savla wrote:

sample of PDF convrted file

Sample file, where hinidi and sanskrit fonts are not rendered.

Page 4 (last word in bracket)
Page 12

Attached:

Thanks! Confirming.
Also for me, every chapter string before the number in the header, plus the footer page number on the table of content page are squares here.

sushant_savla wrote:

Yes, Same happens with me as well. I don't know what text could be there in front of Chapter number. checked book on english wiki source and found that it is word "Chapter" which can be translated in Gujarati as "પ્રકરણ"

The hindi text should be surrounded by a <span lang="hi"> tag. This will switch the font (and select the proper language-dependent hyphenation/etc rules).

As a workaround, a future version of OCG will try "harder" to automatically recognize language switches when language attributes are not present (T106277). But that will never work as well as manually labelling the correct language of the text.

as Sanskrit Wikipedian can I help here?

@NehalDaveND: There is an idea to move away from OfflineContentGenerator to Electron. You could test Hindi and Sanskrit letters on https://www.mediawiki.org/wiki/User:NehalDaveND/Sandbox and then create a PDF via https://www.mediawiki.org/w/index.php?title=Special:ElectronPdf&page=User%3ANehalDaveND%2FSandbox . If there are problems, report them under https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=electron-pdfs (as this task T74005 is about OCG and not Electron). Thanks!

As already announced in Tech News, OfflineContentGenerator (OCG) will not be used anymore after October 1st, 2017 on Wikimedia sites. OCG will be replaced by Electron. You can read more on mediawiki.org.

Aklapper added a project: I18n.

Tstcase at https://gu.wikisource.org/wiki/સભ્ય:Sushant_savla/પુસ્તકો/બુદ્ધ_અને_મહાવીર is gone. :( Given that even T73798 renders characters correctly nowadays that we use Proton and Electron-PDFs on Wikimedia servers, instead of OCG which was used at the time that this ticket was created, I am quite confident that this fixed nowadays by using Proton and Electron-PDFs. I am declining this task as it is about OCG, and OCG has been dead for years and superseded by Proton and Electron-PDFs on Wikimedia servers.