Page MenuHomePhabricator

Japanese fonts on translated pages print as Tofu because "lang" attribute is missing.
Closed, ResolvedPublic

Description

Japanese-language characters are rendered as tofu when I try to export a page translated into Japanese using the Translate extension.

An example is https://meta.wikimedia.org/wiki/Tech/News/2014/40/ja?oldid=10019558&,
rendered as
https://meta.wikimedia.org/w/index.php?title=Special:Book&bookcmd=render_article&arttitle=Tech%2FNews%2F2014%2F40%2Fja&oldid=10019558&writer=rdf2latex. In the linked PDF file, only latin characters are rendered properly.


Version: unspecified
Severity: normal

Details

Reference
bz71380

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:44 AM
bzimport added projects: Parsoid, I18n.
bzimport set Reference to bz71380.

Just to add: the description above is based on what I see on GNOME Document Viewer (evince) 3.10.3.

The issue is that the Translation extension is not adding the proper "lang" attribute on the translated message, so we are trying to render the entire thing as English text.

What translated message? The content is wrapped inside <div id="mw-content-text" lang="ja" dir="ltr" class="mw-content-ltr"></div>. I do not understand what is the problem and how it is related to Translate.

Ah, sorry -- you're right. That's in the PHP parser output.

It's missing from the Parsoid output, however:
http://parsoid-lb.eqiad.wikimedia.org/metawiki/Tech/News/2014/40/ja?oldid=10019558

That has lang=en.

Could you describe how your extension sets the lang attribute on the content (if you know) before I reassign this bug back over to Parsoid? Presumably I need to get this information via some API, as the desired content language is not present in https://meta.wikimedia.org/w/index.php?title=Tech/News/2014/40/ja&action=raw for instance (which I assume is the raw text which Parsoid is working with).

Via the hook: https://github.com/wikimedia/mediawiki-extensions-Translate/blob/master/tag/PageTranslationHooks.php#L67

I am surprised if the page content language is not yet exposed in the API in any way. If not, let's add it.

Ok, great. I'm going to reassign it to Parsoid; I'll open a new bug if it turns out the page content language isn't exposed via some API.